HIGH llm data leakagedjangofirestore

Llm Data Leakage in Django with Firestore

Llm Data Leakage in Django with Firestore — how this specific combination creates or exposes the vulnerability

When Django applications interact with Google Cloud Firestore, data leakage to Large Language Models (LLMs) can occur through several realistic pathways. This combination is notable because Firestore is often used to store structured application data, including user profiles, activity logs, and sensitive configuration, while Django serves as the backend framework that exposes this data through APIs or views. If endpoints are unauthenticated or improperly authorized, an LLM security scan can inadvertently surface private information in LLM responses.

Consider a Django view that retrieves a user document from Firestore and passes it to an LLM-enabled feature, such as a chat assistant or automated reporting tool. If the view does not enforce strict field-level filtering, an LLM probe designed to test data exposure may receive more data than necessary. For example, a Firestore document might contain fields like internal_notes, payment_method, or pii_flags, and if these are included in the context sent to the LLM, they risk being reflected in model outputs or logged in prompts.

Firestore-specific risks in this scenario include the use of broad read permissions or default rules that allow unauthenticated reads during development. In Django, this can happen when Firestore client initialization does not enforce principle of least privilege, or when service account keys are stored insecurely. An attacker running an LLM security probe might use a prompt injection sequence to request the system prompt, attempt jailbreaks, or ask the model to exfiltrate data by generating requests that trigger verbose error messages containing Firestore document paths or keys.

The LLM/AI security checks unique to middleBrick evaluate this surface by testing for system prompt leakage across 27 regex patterns, executing sequential probes for instruction override and data exfiltration, and scanning model outputs for API keys, PII, or executable code. In a Django + Firestore stack, these checks can reveal whether document queries return excessive fields or whether error handling inadvertently exposes stack traces that include Firestore collection names or document IDs. Because Firestore rules and Django models must align tightly, misconfigurations here can allow an LLM to indirectly infer data structures or retrieve sensitive entries through crafted inputs.

Real-world attack patterns relevant to this setup include OWASP API Top 10 violations such as Broken Object Level Authorization (BOLA), where an attacker iterates over Firestore document IDs via predictable URLs, and Input Validation flaws that allow injection of query parameters affecting which documents are retrieved. In scans that integrate with CI/CD via the middleBrick GitHub Action, these issues can be flagged before deployment. The Pro plan’s continuous monitoring can alert teams when new endpoints introduce risky data exposure, while the CLI can be used locally to validate that Firestore queries return only intended fields.

Firestore-Specific Remediation in Django — concrete code fixes

To prevent LLM data leakage in Django applications using Firestore, remediation focuses on strict query scoping, field filtering, and secure handling of Firestore documents. Below are concrete, working examples that demonstrate secure patterns.

First, initialize the Firestore client in a way that avoids embedding service account keys in source code. Use environment variables and ensure the Django settings module does not expose sensitive data in tracebacks.

import os
from google.cloud import firestore

def get_firestore_client():
    # Use Application Default Credentials in production
    client = firestore.Client()
    return client

Next, define a Django view that explicitly selects only required fields and enforces ownership or access checks before passing data to any LLM-related functionality.

from django.http import JsonResponse
from google.cloud import firestore
from django.conf import settings

def get_user_public_profile(request, user_id):
    client = firestore.Client()
    doc_ref = client.collection('users').document(user_id)
    doc = doc_ref.get()
    if not doc.exists:
        return JsonResponse({'error': 'not found'}, status=404)
    # Explicitly allowlisted fields to avoid leaking internal data
    safe_data = {
        'display_name': doc.get('display_name'),
        'email': doc.get('email'),
        'profile_image_url': doc.get('profile_image_url'),
    }
    return JsonResponse(safe_data)

When integrating with LLM features, ensure that the data passed to the model is sanitized and does not include sensitive metadata. For instance, before sending a Firestore document to an LLM endpoint, strip out fields that could trigger output leakage.

def prepare_context_for_llm(doc_snapshot):
    # Exclude fields that should never reach the LLM
    excluded = {'internal_notes', 'password_hash', 'api_key', 'pii_flags'}
    data = {k: v for k, v in doc_snapshot.to_dict().items() if k not in excluded}
    return data

Additionally, configure Firestore security rules to align with Django’s access patterns. Rules should enforce authentication and limit reads to relevant document paths, reducing the chance that an exploited endpoint can traverse collections unexpectedly.

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    match /users/{userId} {
      allow read: if request.auth != null && request.auth.uid == userId;
      allow write: if request.auth != null && request.auth.uid == userId;
    }
  }
}

For teams using the middleBrick dashboard, these changes can be validated by re-running scans and reviewing per-category breakdowns. The CLI can be executed locally with middlebrick scan <url> to confirm that no sensitive fields appear in LLM-related findings. In production, the GitHub Action can enforce a minimum security score before allowing merges, while the MCP Server enables scanning APIs directly from AI coding assistants as developers iterate on Firestore integration code.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How can I verify that my Firestore queries in Django are not exposing sensitive fields to LLMs?
Use the middleBrick CLI to scan your endpoint: run middlebrick scan <your-api-url>. Review the LLM/AI Security section for data exposure findings, and ensure that Firestore document serialization explicitly excludes fields like internal_notes, api_key, and pii_flags before they reach any LLM context.
What Firestore security rule patterns help reduce LLM data leakage risks in Django?
Define rules that enforce authentication and scope reads to the requesting user, such as allowing read only when request.auth.uid matches the document user ID. Avoid wildcard reads and prefer explicit field-level checks in your application logic to prevent overly broad document access that could be probed by LLM tests.