Xml External Entities in Django with Dynamodb
Xml External Entities in Django with Dynamodb — how this specific combination creates or exposes the vulnerability
XML External Entity (XXE) injection occurs when an application processes untrusted XML input and allows an attacker to define external entities. In Django, if user-controlled XML data is parsed by an XML parser configured to resolve external entities, sensitive files can be read, SSRF can be triggered, or denial of service can be induced. When the parsed data is later used to interact with Amazon DynamoDB, the impact can extend to unintended data access or injection of malicious attribute values into database operations.
Consider a Django view that accepts an XML payload and extracts values to query DynamoDB. If the XML parser resolves external entities, an attacker-supplied DOCTYPE can cause the parser to read files such as /etc/passwd or metadata from the EC2 instance (e.g., IAM credentials). These extracted values may then be used as expression attribute values or condition values in DynamoDB API calls. While DynamoDB itself does not process XML, the unsafe parsing layer in Django creates the path for malicious data to reach the database layer.
A concrete risk scenario: an API endpoint accepts an XML document to filter items from a DynamoDB table. The Django code uses xml.etree.ElementTree or a third-party XML library with external entity resolution enabled. An attacker sends an XML document containing a parameter entity that references file:///etc/passwd. The parser resolves the entity and injects the file contents into a query’s filter expression. Although DynamoDB does not interpret the file contents as code, the attacker learns sensitive information that may aid further attacks, such as discovering application roles or bucket names used in conditional logic.
The vulnerability chain is specific to the combination of Django (XML parsing), the attacker-supplied XML, and the downstream use of extracted data in DynamoDB operations. If the XML parser is not hardened and the extracted data is directly interpolated into DynamoDB expressions, the application may exhibit insecure deserialization, information disclosure, or injection-like behavior against the database layer.
To detect this class of issue, scans should check whether the application parses XML input and whether any extracted data is used in DynamoDB operations without validation. The LLM/AI Security checks in middleBrick can probe for system prompt leakage and prompt injection, but XXE requires specific payloads targeting XML parsers; the scanner’s input validation checks help surface unsafe parsing practices that could indirectly affect DynamoDB interactions.
Dynamodb-Specific Remediation in Django — concrete code fixes
Remediation focuses on preventing external entity resolution in XML parsing and ensuring that data used in DynamoDB operations is strictly validated and sanitized. Below are concrete, safe patterns for Django applications that interact with DynamoDB.
1. Disable external entities in XML parsing
Use a parser that does not resolve external entities. For example, with defusedxml:
from defusedxml.ElementTree import fromstring
def safe_parse_xml(xml_data: str):
# This parser does not resolve external entities
return fromstring(xml_data)
If you use lxml, configure it to disable DTD and entity resolution:
from lxml import etree
def safe_lxml_parse(xml_data: str):
parser = etree.XMLParser(resolve_entities=False, no_network=True, dtd_load=False)
return etree.fromstring(xml_data, parser=parser)
2. Validate and sanitize data before DynamoDB usage
Never directly use extracted XML values in DynamoDB expressions. Validate against an allowlist and use DynamoDB’s provided parameterization to avoid injection-like issues. For example, using boto3 with ExpressionAttributeValues:
import boto3
from botocore.exceptions import ClientError
def get_item_safe(table_name, key_value):
# Assume key_value has been validated and normalized
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(table_name)
try:
response = table.get_item(
Key={
'id': key_value # validated string, not raw XML text
}
)
return response.get('Item')
except ClientError as e:
# Handle expected ConditionalCheckFailedException and others
raise e
3. Use strict schema validation for incoming XML
Define an XSD or use a strict serializer (e.g., Django forms or Pydantic) to ensure only expected fields are accepted. Example with a simple form:
from django import forms
class ItemFilterForm(forms.Form):
item_id = forms.CharField(max_length=64, regex=r'^[a-zA-Z0-9_-]+$')
def clean(self):
cleaned_data = super().clean()
item_id = cleaned_data.get('item_id')
if not item_id.startswith('ITEM-'):
raise forms.ValidationError('Invalid item_id format')
return cleaned_data
In your view, reject XML that cannot be mapped to this schema, and do not attempt to extract or use disallowed fields in DynamoDB queries.
4. Enforce least privilege for DynamoDB access
Ensure the IAM role or user associated with your Django application has only the permissions needed for the intended DynamoDB operations. Avoid broad dynamodb:* permissions. For example, a policy allowing read-only access to a specific table:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:Query"
],
"Resource": "arn:aws:dynamodb:region:account-id:table/YourTableName"
}
]
}
5. Logging and monitoring
Log rejected inputs and failed validation events without logging raw XML or sensitive extracted content. Monitor for repeated malformed XML submissions that may indicate probing or attacks.
Example safe flow in a Django view
from django.http import JsonResponse
from defusedxml.ElementTree import fromstring
from .forms import ItemFilterForm
from .dynamodb import get_item_safe
def item_detail(request):
xml_data = request.body.decode('utf-8')
try:
root = fromstring(xml_data) # safe parse
item_id = root.findtext('item_id')
form = ItemFilterForm({'item_id': item_id})
if not form.is_valid():
return JsonResponse({'error': 'Invalid input'}, status=400)
item = get_item_safe('YourTableName', form.cleaned_data['item_id'])
return JsonResponse(item or {})
except Exception:
return JsonResponse({'error': 'Processing failed'}, status=400)
This approach ensures XML parsing does not resolve external entities, input is validated before use, and DynamoDB calls use safe, parameterized patterns.