Pii Leakage in Django with Cockroachdb
Pii Leakage in Django with Cockroachdb — how this specific combination creates or exposes the vulnerability
Django applications using CockroachDB can inadvertently expose personally identifiable information (PII) through a combination of ORM behavior, database-specific nuances, and insecure coding patterns. CockroachDB, while PostgreSQL-wire compatible, introduces distributed execution and multi-region considerations that can affect query visibility and caching, which may unintentionally surface sensitive data.
One common pattern is querying across related models without applying strict field-level filtering. For example, a view that serializes an Employee model and includes related Payroll data may return salary or SSN fields if the serialization layer does not explicitly exclude them:
class EmployeeViewSet(viewsets.ModelViewSet):
queryset = Employee.objects.all().select_related('payroll')
serializer_class = EmployeeSerializer
If EmployeeSerializer includes fields like payroll.ssn or payroll.bank_account without read-only restrictions or explicit filtering, these PII fields can be exposed in API responses. CockroachDB’s distributed nature does not mitigate this; it only means that queries might be served from different nodes, potentially bypassing local cache controls that an operator might assume are enforcing data minimization.
Another exposure vector arises from misconfigured QuerySet chaining and the use of .only() / .defer(). These methods affect which columns are loaded, but they do not prevent Django’s serializer from accessing deferred attributes if the model instance is passed to a context where those attributes are already loaded or accidentally hydrated. In CockroachDB, partial column reads may still retrieve full rows depending on the transaction isolation level and how the distributed SQL layer optimizes scans, which can lead to more data being present in memory than intended.
Logging and error handling also contribute to risk. If a Django view catches database exceptions and logs raw query parameters or model instances, PII such as email addresses or phone numbers can end up in application logs or monitoring data. CockroachDB’s SQL layer may echo bound parameters in debug traces or cluster UI logs, especially during distributed transactions, increasing the surface for accidental exposure.
Finally, insecure direct object references (IDOR) combined with insufficient row-level security assumptions can allow an attacker to iterate over identifiers and access other users’ PII. While CockroachDB does not enforce application-level permissions, developers might mistakenly assume that primary key enumeration is harmless, particularly in APIs where filters are missing or incomplete.
Cockroachdb-Specific Remediation in Django — concrete code fixes
Remediation focuses on strict field control, query hardening, and avoiding assumptions about CockroachDB’s distributed behavior. Always explicitly define which fields are serialized and enforce read permissions at the serializer or view level.
Use exclude or explicit fields in serializers and avoid relying on model defaults:
from rest_framework import serializers
from .models import Employee, Payroll
class PayrollSerializer(serializers.ModelSerializer):
class Meta:
model = Payroll
fields = ['gross_salary', 'currency'] # Exclude PII fields
extra_kwargs = {
'ssn': {'read_only': False, 'write_only': True},
'bank_account': {'read_only': True}, # Never return in API
}
class EmployeeSerializer(serializers.ModelSerializer):
payroll = PayrollSerializer(read_only=True)
class Meta:
model = Employee
fields = ['id', 'name', 'department', 'payroll']
# Explicitly exclude PII across relationships
extra_kwargs = {
'email': {'read_only': True},
'payroll__ssn': {'read_only': True},
}
Apply row-level filtering at the queryset to ensure users only see their own data, and avoid trusting CockroachDB’s optimizer to hide rows:
from django.shortcuts import get_object_or_404
from .models import Employee
def get_queryset(self):
if self.action == 'retrieve':
return Employee.objects.filter(pk=self.kwargs['pk'])
return Employee.objects.filter(created_by=self.request.user)
When using select_related or prefetch_related, pair them with .only() to restrict loaded columns, but validate that deferred fields are never accidentally accessed:
queryset = Employee.objects.select_related('payroll').only('id', 'name', 'payroll__gross_salary', 'payroll__currency')
For CockroachDB-specific behavior, ensure that distributed transactions do not inadvertently expose data through retries or ambiguous error messages. Wrap sensitive operations in explicit transactions with clear failure modes and avoid logging raw model instances:
from django.db import transaction
with transaction.atomic():
employee = Employee.objects.select_for_update().get(pk=id)
# Process without dumping employee to logs
result = safe_process(employee)
# Never do: logger.error(f'Failed for {employee.ssn}')
Finally, integrate middleware or signal handlers to scrub PII from logs and responses. Combine these practices with regular scans using tools like the middleBrick CLI to detect residual exposure:
# Example: middlebrick scan https://api.example.com --token $MB_TOKEN
Using the middleBrick Web Dashboard or GitHub Action allows you to fail builds when PII-related findings appear, ensuring ongoing alignment with OWASP API Top 10 and compliance frameworks.
Related CWEs: dataExposure
| CWE ID | Name | Severity |
|---|---|---|
| CWE-200 | Exposure of Sensitive Information | HIGH |
| CWE-209 | Error Information Disclosure | MEDIUM |
| CWE-213 | Exposure of Sensitive Information Due to Incompatible Policies | HIGH |
| CWE-215 | Insertion of Sensitive Information Into Debugging Code | MEDIUM |
| CWE-312 | Cleartext Storage of Sensitive Information | HIGH |
| CWE-359 | Exposure of Private Personal Information (PII) | HIGH |
| CWE-522 | Insufficiently Protected Credentials | CRITICAL |
| CWE-532 | Insertion of Sensitive Information into Log File | MEDIUM |
| CWE-538 | Insertion of Sensitive Information into Externally-Accessible File | HIGH |
| CWE-540 | Inclusion of Sensitive Information in Source Code | HIGH |