Llm Data Leakage in Django with Dynamodb
Llm Data Leakage in Django with Dynamodb — how this specific combination creates or exposes the vulnerability
LLM data leakage occurs when an application unintentionally exposes sensitive information through language model interactions. In a Django application using Amazon DynamoDB as the backend, the risk arises from how data is retrieved, formatted, and passed to LLM endpoints. If DynamoDB items contain personal data, credentials, or operational details and are forwarded to an LLM without careful filtering, that data can be exposed in prompts, tool calls, or LLM responses.
DynamoDB’s schema-less design can store nested and unstructured data, which may inadvertently include fields such as api_key, session_token, or debug_log. When a Django view queries DynamoDB (for example, using boto3) and then sends the retrieved records to an LLM for summarization or analysis, the LLM may leak this data through its outputs or via tool-use behavior. For instance, if a debugging or analytics workflow sends entire DynamoDB items to an LLM, the system prompt or generated content might reflect sensitive values.
Additionally, the integration pattern in Django often involves serializers that transform DynamoDB items into Python dictionaries. If these serializers include sensitive fields and those dictionaries are later used in LLM prompts, the data becomes part of the prompt context. LLM-specific checks in middleBrick detect this class of risk by scanning for system prompt leakage patterns and by testing whether unauthenticated endpoints can cause an LLM to reveal training data or private context. In this architecture, the lack of strict field-level authorization between DynamoDB and the LLM increases the chance of leaking credentials or PII through chat completions or function call outputs.
The combination of Django’s ORM-like query patterns, DynamoDB’s flexible item structure, and the stateless nature of many LLM integrations means that developers may not explicitly realize that rich data is being forwarded. middleBrick’s LLM/AI Security checks, including active prompt injection testing and output scanning for API keys and PII, are designed to surface these exposures. Without controls such as field filtering, redaction, or strict authorization before data reaches the LLM, a Django app using DynamoDB can unintentionally expose sensitive information through LLM interactions.
Dynamodb-Specific Remediation in Django — concrete code fixes
To reduce LLM data leakage when using DynamoDB in Django, apply field-level filtering and strict schema governance. Only project the attributes required for the immediate operation and exclude known sensitive fields before any data is sent to an LLM. Use DynamoDB’s ProjectionExpression and FilterExpression to limit the data retrieved, and validate item contents in Python before constructing prompts.
Example: retrieve only necessary fields and redact sensitive keys before sending data to an LLM endpoint.
import boto3
from django.conf import settings
def get_user_profile_safe(user_id: str):
client = boto3.resource('dynamodb', region_name=settings.AWS_REGION)
table = client.Table(settings.DYNAMODB_PROFILES_TABLE)
response = table.get_item(
Key={'user_id': user_id},
ProjectionExpression='user_id,email,display_name,updated_at'
)
item = response.get('Item', {})
# Explicitly remove any residual sensitive fields
item.pop('api_key', None)
item.pop('session_token', None)
return item
def make_llm_request_safe(profile_item: dict):
# Only pass approved fields to the LLM
context = {
'user_id': profile_item.get('user_id'),
'email': profile_item.get('email'),
'display_name': profile_item.get('display_name')
}
# Construct prompt using safe context
prompt = f"Summarize preferences for user {context['display_name']} (ID: {context['user_id']})."
# Here you would call your LLM client with the controlled prompt
# llm_response = llm_client.complete(prompt)
return prompt
Example: define a Pydantic model to enforce schema and exclude sensitive fields during serialization.
from pydantic import BaseModel
from typing import Optional
class SafeProfile(BaseModel):
user_id: str
email: str
display_name: Optional[str] = None
class Config:
from_attributes = True
def serialize_profile_dynamodb(item: dict) -> SafeProfile:
# Ensure only expected fields are used
return SafeProfile(
user_id=item['user_id'],
email=item['email'],
display_name=item.get('display_name')
)
In the GitHub Action and CI/CD workflows, you can add API security checks to fail builds if risk scores indicate potential LLM leakage. The MCP server allows you to scan APIs directly from your AI coding assistant, helping catch这些问题 during development. The dashboard can track these findings over time and map them to frameworks like OWASP API Top 10 and GDPR.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |