Pii Leakage in Flask with Cockroachdb
Pii Leakage in Flask with Cockroachdb — how this specific combination creates or exposes the vulnerability
PII leakage occurs when a Flask application using CockroachDB inadvertently exposes sensitive personal data such as email addresses, phone numbers, or government IDs through insecure endpoints, misconfigured queries, or insufficient output handling. CockroachDB, being a distributed SQL database, does not inherently prevent exposure; it simply stores and returns whatever queries the application executes. If Flask routes construct dynamic queries by concatenating user input into SQL strings, or if query builders omit field-level filtering, full rows or columns containing PII can be returned to the client or logged.
Another common pattern is serialization without redaction. For example, a Flask route might call cursor.execute('SELECT * FROM users WHERE id = $1', [user_id]), then serialize the result directly into JSON using jsonify(dict(row)). This approach can expose every column, including fields like ssn, address, or internal_notes. MiddleBrick’s scans detect such outcomes as Data Exposure findings, highlighting endpoints that return complete database rows without field filtering or masking.
Logging and debugging practices can compound the risk. If Flask logs raw query results or stack traces that include PII, those logs can be accessed by unauthorized parties. CockroachDB’s wire protocol and result sets are not encrypted at the application layer by default; without explicit encryption settings, data in transit between Flask and the database can be intercepted in environments without TLS. Additionally, ORM abstractions like SQLAlchemy may lazily load related entities, inadvertently pulling in sensitive associations when a developer expects only a subset of fields. These factors make the Flask-CockroachDB stack particularly susceptible to PII leakage when secure coding practices are not consistently applied.
Cockroachdb-Specific Remediation in Flask — concrete code fixes
Remediation centers on strict field selection, parameterized queries, and output sanitization. Always specify columns instead of using SELECT *, and apply server-side filtering to ensure only necessary data is retrieved. Use CockroachDB’s native features and Flask patterns that minimize PII exposure.
Parameterized queries with explicit columns
Replace dynamic SQL concatenation with parameterized statements and explicitly list required columns:
import psycopg2
from flask import Flask, request, jsonify
app = Flask(__name__)
# Secure: explicit columns and parameterized query
@app.route('/api/users/')
def get_user(user_id):
conn = psycopg2.connect(
host='your-cockroachdb-host',
port=26257,
dbname='appdb',
user='appuser',
password='securepassword',
sslmode='require'
)
cursor = conn.cursor()
cursor.execute(
'SELECT id, email, full_name FROM users WHERE id = %s',
(user_id,)
)
row = cursor.fetchone()
cursor.close()
conn.close()
if row is None:
return jsonify({'error': 'not found'}), 404
return jsonify({'id': row[0], 'email': row[1], 'full_name': row[2]})
Masking and redaction before serialization
When full rows are necessary, mask sensitive fields before serialization:
import math
def mask_email(email: str) -> str:
if not email:
return email
local, domain = email.split('@', 1)
if len(local) <= 2:
masked_local = local[0] + '**'
else:
masked_local = local[:2] + '**' + local[-1]
return f'{masked_local}@{domain}'
@app.route('/api/users/')
def get_user_masked(user_id):
conn = psycopg2.connect(
host='your-cockroachdb-host',
port=26257,
dbname='appdb',
user='appuser',
password='securepassword',
sslmode='require'
)
cursor = conn.cursor()
cursor.execute('SELECT id, email, ssn FROM users WHERE id = %s', (user_id,))
row = cursor.fetchone()
cursor.close()
conn.close()
if row is None:
return jsonify({'error': 'not found'}), 404
return jsonify({
'id': row[0],
'email': mask_email(row[1]),
'ssn': '**-**-' + row[2][-4:] if row[2] else None
})
Connection and query best practices
Ensure TLS for data in transit and avoid logging sensitive values:
import logging
logger = logging.getLogger(__name__)
@app.route('/api/profiles/')
def get_profile(profile_id):
conn = psycopg2.connect(
host='your-cockroachdb-host',
port=26257,
dbname='appdb',
user='appuser',
password='securepassword',
sslmode='require'
)
try:
with conn.cursor() as cursor:
cursor.execute(
'SELECT user_id, nickname, phone_number FROM profiles WHERE id = %s',
(profile_id,)
)
row = cursor.fetchone()
if row is None:
return jsonify({'error': 'not found'}), 404
# Avoid logging PII
logger.info('Profile retrieved for user_id=%s', row[0])
return jsonify({'user_id': row[0], 'nickname': row[1], 'phone_number': row[2]})
finally:
conn.close()
These practices reduce the attack surface and align with Data Exposure checks that MiddleBrick performs, ensuring that endpoints do not return complete PII-bearing rows and that sensitive fields are appropriately masked.
Related CWEs: dataExposure
| CWE ID | Name | Severity |
|---|---|---|
| CWE-200 | Exposure of Sensitive Information | HIGH |
| CWE-209 | Error Information Disclosure | MEDIUM |
| CWE-213 | Exposure of Sensitive Information Due to Incompatible Policies | HIGH |
| CWE-215 | Insertion of Sensitive Information Into Debugging Code | MEDIUM |
| CWE-312 | Cleartext Storage of Sensitive Information | HIGH |
| CWE-359 | Exposure of Private Personal Information (PII) | HIGH |
| CWE-522 | Insufficiently Protected Credentials | CRITICAL |
| CWE-532 | Insertion of Sensitive Information into Log File | MEDIUM |
| CWE-538 | Insertion of Sensitive Information into Externally-Accessible File | HIGH |
| CWE-540 | Inclusion of Sensitive Information in Source Code | HIGH |