HIGH xml external entitiesdjangomongodb

Xml External Entities in Django with Mongodb

Xml External Entities in Django with Mongodb — how this specific combination creates or exposes the vulnerability

XML External Entity (XXE) injection occurs when an application processes XML input that references external entities, and those entities are resolved in an unsafe way. In Django, this typically arises when XML data is parsed using libraries that support external entity expansion, such as lxml or Python’s built-in xml.etree.ElementTree with custom entity resolvers. When the parsed data is later used to interact with a backend data store like MongoDB, the application may pass user-controlled XML-derived values directly into database operations, unintentionally exposing internal resources or enabling SSRF-like behaviors.

Django does not include an XML parser by default, so developers often add one explicitly. If the parser is configured to resolve external entities (e.g., via resolve_entities=True or by defining custom EntityResolver classes), attacker-controlled XML can trigger requests to internal services, file system paths, or other backend systems. MongoDB, while not an XML parser, becomes involved when the extracted entity values are used in queries—for example, constructing a filter or passing a URI string to a database command. An attacker may supply an entity like &internalFile SYSTEM file:///etc/passwd; if the application embeds this value into a MongoDB operation, it can lead to unintended data access or serve as a pivot point for SSRF.

The risk is compounded when Django deserializes XML into model fields or passes raw XML content to aggregation pipelines. For instance, if a field intended to store simple strings is populated from an external entity, subsequent queries that match on that field may expose sensitive records or bypass intended filters. Additionally, if the application uses XML to configure service endpoints (such as logging or monitoring integrations), malicious entities can redirect traffic to internal MongoDB instances or configuration endpoints. Because the vulnerability spans three layers—XML parsing logic, data handling in Django views, and the MongoDB interaction layer—defense requires controls at each stage rather than a single fix.

To illustrate, consider a Django view that accepts an XML upload, extracts a username via an external entity, and uses it to query MongoDB:

import xml.etree.ElementTree as ET
def vulnerable_view(request):
    xml_data = request.body
    tree = ET.ElementTree(file=xml_data)  # unsafe if entities are enabled
    root = tree.getroot()
    username = root.find('username').text
    from pymongo import MongoClient
    client = MongoClient('mongodb://localhost:27017')
    db = client['mydb']
    users = db.users.find({'username': username})
    return HttpResponse(str(list(users)))

An attacker could send XML declaring <!ENTITY xxe SYSTEM 'file:///etc/passwd'> and reference &xxe; in the username field. If the parser resolves the entity, the database query may match unexpected records or trigger side channels. Even when using lxml, the same class of risk exists if the parser is not hardened.

Mongodb-Specific Remediation in Django — concrete code fixes

Mitigating XXE in a Django + MongoDB stack centers on disabling external entity resolution at the XML parser level and validating/sanitizing data before it reaches database operations. The safest approach is to avoid XML parsing entirely when possible; if XML is required, use a parser configured to ignore external entities.

First, ensure your XML parser does not resolve external entities. With lxml, explicitly disable DTD and entity resolution:

from lxml import etree
def safe_parse_xml(xml_bytes):
    parser = etree.XMLParser(resolve_entities=False, no_network=True)
    return etree.fromstring(xml_bytes, parser=parser)

With the standard library, avoid Entity resolution by not using a custom entity loader and by preferring safer alternatives like defusedxml:

from defusedxml.ElementTree import fromstring
def safe_load_xml(xml_bytes):
    return fromstring(xml_bytes)  # entities and external references are blocked

Second, treat extracted values as untrusted input and apply strict validation before using them in MongoDB queries. Use schema-based validation (e.g., Django forms or DRF serializers) and avoid directly injecting XML content into query filters. If you must pass values to MongoDB, use parameterized patterns and whitelisting:

from pymongo import MongoClient
def query_user(username: str):
    client = MongoClient('mongodb://localhost:27017')
    db = client['mydb']
    # Validate username against an allowlist or pattern before querying
    if not re.match(r'^[\w\-\.]{1,64}$', username):
        raise ValueError('Invalid username')
    users = db.users.find({'username': username})
    return list(users)

Third, apply principle of least privilege to the MongoDB connection used by Django. Configure the connection string with minimal roles and avoid embedding sensitive metadata in XML-derived fields. For example, use a dedicated read-only user for queries that do not require write access, and avoid passing host or port overrides via extracted XML values:

from pymongo import MongoClient
# Use a restricted connection URI; do not construct URIs from XML content
client = MongoClient('mongodb://readonly_user:password@localhost:27017/mydb?authSource=admin')
db = client['mydb']

Finally, integrate these checks into your Django middleware or serializer layer so that any XML ingestion is uniformly handled. Combine this with runtime security tools listed in the middleBrick platform—such as the LLM/AI Security and Input Validation checks—to detect risky patterns in unauthenticated scans. The CLI can be used locally during development: middlebrick scan <url>, while the GitHub Action can enforce a maximum risk score in CI/CD pipelines. For broader coverage, the Pro plan’s continuous monitoring can alert you if new endpoints introduce unsafe XML handling, and the Web Dashboard helps track improvements over time.

Frequently Asked Questions

Can XXE affect MongoDB even if the database doesn’t parse XML?
Yes. While MongoDB does not parse XML, an attacker can supply XML-derived values that are used in Django code to construct queries, change endpoints, or trigger SSRF interactions. The risk is in how XML-controlled data influences MongoDB operations, not in MongoDB parsing XML itself.
Does using an ORM eliminate XXE risks with MongoDB in Django?
Not automatically. ORMs protect against many injection types, but if you extract data from XML and pass it directly to ORM or PyMongo queries without validation, malicious entity values can still reach the database. Always validate and sanitize XML-derived inputs regardless of the access layer.