HIGH pii leakageflaskcockroachdb

Pii Leakage in Flask with Cockroachdb

Pii Leakage in Flask with Cockroachdb — how this specific combination creates or exposes the vulnerability

PII leakage occurs when a Flask application using CockroachDB inadvertently exposes sensitive personal data such as email addresses, phone numbers, or government IDs through insecure endpoints, misconfigured queries, or insufficient output handling. CockroachDB, being a distributed SQL database, does not inherently prevent exposure; it simply stores and returns whatever queries the application executes. If Flask routes construct dynamic queries by concatenating user input into SQL strings, or if query builders omit field-level filtering, full rows or columns containing PII can be returned to the client or logged.

Another common pattern is serialization without redaction. For example, a Flask route might call cursor.execute('SELECT * FROM users WHERE id = $1', [user_id]), then serialize the result directly into JSON using jsonify(dict(row)). This approach can expose every column, including fields like ssn, address, or internal_notes. MiddleBrick’s scans detect such outcomes as Data Exposure findings, highlighting endpoints that return complete database rows without field filtering or masking.

Logging and debugging practices can compound the risk. If Flask logs raw query results or stack traces that include PII, those logs can be accessed by unauthorized parties. CockroachDB’s wire protocol and result sets are not encrypted at the application layer by default; without explicit encryption settings, data in transit between Flask and the database can be intercepted in environments without TLS. Additionally, ORM abstractions like SQLAlchemy may lazily load related entities, inadvertently pulling in sensitive associations when a developer expects only a subset of fields. These factors make the Flask-CockroachDB stack particularly susceptible to PII leakage when secure coding practices are not consistently applied.

Cockroachdb-Specific Remediation in Flask — concrete code fixes

Remediation centers on strict field selection, parameterized queries, and output sanitization. Always specify columns instead of using SELECT *, and apply server-side filtering to ensure only necessary data is retrieved. Use CockroachDB’s native features and Flask patterns that minimize PII exposure.

Parameterized queries with explicit columns

Replace dynamic SQL concatenation with parameterized statements and explicitly list required columns:

import psycopg2
from flask import Flask, request, jsonify

app = Flask(__name__)

# Secure: explicit columns and parameterized query
@app.route('/api/users/')
def get_user(user_id):
    conn = psycopg2.connect(
        host='your-cockroachdb-host',
        port=26257,
        dbname='appdb',
        user='appuser',
        password='securepassword',
        sslmode='require'
    )
    cursor = conn.cursor()
    cursor.execute(
        'SELECT id, email, full_name FROM users WHERE id = %s',
        (user_id,)
    )
    row = cursor.fetchone()
    cursor.close()
    conn.close()
    if row is None:
        return jsonify({'error': 'not found'}), 404
    return jsonify({'id': row[0], 'email': row[1], 'full_name': row[2]})

Masking and redaction before serialization

When full rows are necessary, mask sensitive fields before serialization:

import math

def mask_email(email: str) -> str:
    if not email:
        return email
    local, domain = email.split('@', 1)
    if len(local) <= 2:
        masked_local = local[0] + '**'
    else:
        masked_local = local[:2] + '**' + local[-1]
    return f'{masked_local}@{domain}'

@app.route('/api/users/')
def get_user_masked(user_id):
    conn = psycopg2.connect(
        host='your-cockroachdb-host',
        port=26257,
        dbname='appdb',
        user='appuser',
        password='securepassword',
        sslmode='require'
    )
    cursor = conn.cursor()
    cursor.execute('SELECT id, email, ssn FROM users WHERE id = %s', (user_id,))
    row = cursor.fetchone()
    cursor.close()
    conn.close()
    if row is None:
        return jsonify({'error': 'not found'}), 404
    return jsonify({
        'id': row[0],
        'email': mask_email(row[1]),
        'ssn': '**-**-' + row[2][-4:] if row[2] else None
    })

Connection and query best practices

Ensure TLS for data in transit and avoid logging sensitive values:

import logging
logger = logging.getLogger(__name__)

@app.route('/api/profiles/')
def get_profile(profile_id):
    conn = psycopg2.connect(
        host='your-cockroachdb-host',
        port=26257,
        dbname='appdb',
        user='appuser',
        password='securepassword',
        sslmode='require'
    )
    try:
        with conn.cursor() as cursor:
            cursor.execute(
                'SELECT user_id, nickname, phone_number FROM profiles WHERE id = %s',
                (profile_id,)
            )
            row = cursor.fetchone()
            if row is None:
                return jsonify({'error': 'not found'}), 404
            # Avoid logging PII
            logger.info('Profile retrieved for user_id=%s', row[0])
            return jsonify({'user_id': row[0], 'nickname': row[1], 'phone_number': row[2]})
    finally:
        conn.close()

These practices reduce the attack surface and align with Data Exposure checks that MiddleBrick performs, ensuring that endpoints do not return complete PII-bearing rows and that sensitive fields are appropriately masked.

Related CWEs: dataExposure

CWE ID	Name	Severity
CWE-200	Exposure of Sensitive Information	HIGH
CWE-209	Error Information Disclosure	MEDIUM
CWE-213	Exposure of Sensitive Information Due to Incompatible Policies	HIGH
CWE-215	Insertion of Sensitive Information Into Debugging Code	MEDIUM
CWE-312	Cleartext Storage of Sensitive Information	HIGH
CWE-359	Exposure of Private Personal Information (PII)	HIGH
CWE-522	Insufficiently Protected Credentials	CRITICAL
CWE-532	Insertion of Sensitive Information into Log File	MEDIUM
CWE-538	Insertion of Sensitive Information into Externally-Accessible File	HIGH
CWE-540	Inclusion of Sensitive Information in Source Code	HIGH

Frequently Asked Questions

Does specifying columns instead of SELECT * fully prevent PII leakage?

No, specifying columns reduces risk by limiting returned data, but you must also ensure those columns do not include sensitive fields and that output is properly masked before serialization.

Is encryption in transit handled automatically by CockroachDB when used with Flask?

CockroachDB supports TLS for encryption in transit, but Flask must configure sslmode='require' (or equivalent) and use it consistently; otherwise data can be exposed between the app and the database.

Pii Leakage in Flask with Cockroachdb