HIGH pii leakageflaskcockroachdb

Pii Leakage in Flask with Cockroachdb

Pii Leakage in Flask with Cockroachdb — how this specific combination creates or exposes the vulnerability

PII leakage occurs when a Flask application using CockroachDB inadvertently exposes sensitive personal data such as email addresses, phone numbers, or government IDs through insecure endpoints, misconfigured queries, or insufficient output handling. CockroachDB, being a distributed SQL database, does not inherently prevent exposure; it simply stores and returns whatever queries the application executes. If Flask routes construct dynamic queries by concatenating user input into SQL strings, or if query builders omit field-level filtering, full rows or columns containing PII can be returned to the client or logged.

Another common pattern is serialization without redaction. For example, a Flask route might call cursor.execute('SELECT * FROM users WHERE id = $1', [user_id]), then serialize the result directly into JSON using jsonify(dict(row)). This approach can expose every column, including fields like ssn, address, or internal_notes. MiddleBrick’s scans detect such outcomes as Data Exposure findings, highlighting endpoints that return complete database rows without field filtering or masking.

Logging and debugging practices can compound the risk. If Flask logs raw query results or stack traces that include PII, those logs can be accessed by unauthorized parties. CockroachDB’s wire protocol and result sets are not encrypted at the application layer by default; without explicit encryption settings, data in transit between Flask and the database can be intercepted in environments without TLS. Additionally, ORM abstractions like SQLAlchemy may lazily load related entities, inadvertently pulling in sensitive associations when a developer expects only a subset of fields. These factors make the Flask-CockroachDB stack particularly susceptible to PII leakage when secure coding practices are not consistently applied.

Cockroachdb-Specific Remediation in Flask — concrete code fixes

Remediation centers on strict field selection, parameterized queries, and output sanitization. Always specify columns instead of using SELECT *, and apply server-side filtering to ensure only necessary data is retrieved. Use CockroachDB’s native features and Flask patterns that minimize PII exposure.

Parameterized queries with explicit columns

Replace dynamic SQL concatenation with parameterized statements and explicitly list required columns:

import psycopg2
from flask import Flask, request, jsonify

app = Flask(__name__)

# Secure: explicit columns and parameterized query
@app.route('/api/users/')
def get_user(user_id):
    conn = psycopg2.connect(
        host='your-cockroachdb-host',
        port=26257,
        dbname='appdb',
        user='appuser',
        password='securepassword',
        sslmode='require'
    )
    cursor = conn.cursor()
    cursor.execute(
        'SELECT id, email, full_name FROM users WHERE id = %s',
        (user_id,)
    )
    row = cursor.fetchone()
    cursor.close()
    conn.close()
    if row is None:
        return jsonify({'error': 'not found'}), 404
    return jsonify({'id': row[0], 'email': row[1], 'full_name': row[2]})

Masking and redaction before serialization

When full rows are necessary, mask sensitive fields before serialization:

import math

def mask_email(email: str) -> str:
    if not email:
        return email
    local, domain = email.split('@', 1)
    if len(local) <= 2:
        masked_local = local[0] + '**'
    else:
        masked_local = local[:2] + '**' + local[-1]
    return f'{masked_local}@{domain}'

@app.route('/api/users/')
def get_user_masked(user_id):
    conn = psycopg2.connect(
        host='your-cockroachdb-host',
        port=26257,
        dbname='appdb',
        user='appuser',
        password='securepassword',
        sslmode='require'
    )
    cursor = conn.cursor()
    cursor.execute('SELECT id, email, ssn FROM users WHERE id = %s', (user_id,))
    row = cursor.fetchone()
    cursor.close()
    conn.close()
    if row is None:
        return jsonify({'error': 'not found'}), 404
    return jsonify({
        'id': row[0],
        'email': mask_email(row[1]),
        'ssn': '**-**-' + row[2][-4:] if row[2] else None
    })

Connection and query best practices

Ensure TLS for data in transit and avoid logging sensitive values:

import logging
logger = logging.getLogger(__name__)

@app.route('/api/profiles/')
def get_profile(profile_id):
    conn = psycopg2.connect(
        host='your-cockroachdb-host',
        port=26257,
        dbname='appdb',
        user='appuser',
        password='securepassword',
        sslmode='require'
    )
    try:
        with conn.cursor() as cursor:
            cursor.execute(
                'SELECT user_id, nickname, phone_number FROM profiles WHERE id = %s',
                (profile_id,)
            )
            row = cursor.fetchone()
            if row is None:
                return jsonify({'error': 'not found'}), 404
            # Avoid logging PII
            logger.info('Profile retrieved for user_id=%s', row[0])
            return jsonify({'user_id': row[0], 'nickname': row[1], 'phone_number': row[2]})
    finally:
        conn.close()

These practices reduce the attack surface and align with Data Exposure checks that MiddleBrick performs, ensuring that endpoints do not return complete PII-bearing rows and that sensitive fields are appropriately masked.

Related CWEs: dataExposure

CWE IDNameSeverity
CWE-200Exposure of Sensitive Information HIGH
CWE-209Error Information Disclosure MEDIUM
CWE-213Exposure of Sensitive Information Due to Incompatible Policies HIGH
CWE-215Insertion of Sensitive Information Into Debugging Code MEDIUM
CWE-312Cleartext Storage of Sensitive Information HIGH
CWE-359Exposure of Private Personal Information (PII) HIGH
CWE-522Insufficiently Protected Credentials CRITICAL
CWE-532Insertion of Sensitive Information into Log File MEDIUM
CWE-538Insertion of Sensitive Information into Externally-Accessible File HIGH
CWE-540Inclusion of Sensitive Information in Source Code HIGH

Frequently Asked Questions

Does specifying columns instead of SELECT * fully prevent PII leakage?
No, specifying columns reduces risk by limiting returned data, but you must also ensure those columns do not include sensitive fields and that output is properly masked before serialization.
Is encryption in transit handled automatically by CockroachDB when used with Flask?
CockroachDB supports TLS for encryption in transit, but Flask must configure sslmode='require' (or equivalent) and use it consistently; otherwise data can be exposed between the app and the database.