HIGH xml external entitiesdjangofirestore

Xml External Entities in Django with Firestore

Xml External Entities in Django with Firestore — how this specific combination creates or exposes the vulnerability

XML External Entity (XXE) injection occurs when an application processes XML input that references external entities, allowing an attacker to force the parser to read local files, make internal network requests, or cause denial of service. In Django, this risk arises when your project uses XML parsing—such as with legacy SOAP services, document imports, or SAML-based authentication—without disabling external entity resolution. Firestore does not directly parse XML, but it can store and serve XML blobs, configuration files, or exported data that your Django app later processes. If your Django code retrieves an XML document from Firestore and feeds it to an XML parser without proper hardening, the stored XML may contain malicious external entity declarations that lead to local file reads or SSRF when the parser resolves them.

The combination is hazardous because Firestore is commonly used as a document store for application data, including user-provided or third-party XML uploads. A developer might assume that because Firestore is a managed NoSQL database, the data is inherently safe; however, the security boundary is at the application layer. If Django deserializes XML from Firestore using Python’s standard xml.etree.ElementTree, xml.dom.minidom, or lxml with default settings, the parser can resolve DOCTYPE external entities. For example, an attacker who can write an XML file to Firestore (e.g., via a misconfigured upload endpoint) or trick an admin into storing a malicious document could later trigger reads of /etc/passwd or internal metadata service endpoints when the Django worker parses the file. This maps to OWASP API Top 10 A05:2023 Security Misconfiguration and A01:2027 Broken Access Control, and can be mapped to compliance frameworks such as PCI-DSS and SOC2.

Concrete scenario: A Django management command pulls XML reports from a Firestore collection, parses them to aggregate metrics, and runs in the same project as the Firestore credentials. The XML retrieved from Firestore contains a DOCTYPE with an entity pointing to file:///etc/passwd. If the parser is not configured to disable external entities, the file is read and its contents exfiltrated via a secondary channel or reflected in error messages. Because Firestore stores the XML as-is, the malicious content persists until explicitly removed, enabling repeated exploitation. The vulnerability is not in Firestore itself but in how Django processes XML retrieved from it.

To mitigate this specific combination, treat XML from Firestore as untrusted input, apply secure parsing defaults, and validate/sanitize data before storage when possible. Use Django’s defense-in-depth practices—input validation, least-privilege Firestore IAM, and strict parser configuration—to reduce the attack surface. Security scans with a tool like middleBrick can detect unsafe XML parsing patterns and insecure data flows involving Firestore, helping you prioritize remediation.

Firestore-Specific Remediation in Django — concrete code fixes

Remediation focuses on disabling external entity resolution in XML parsers and ensuring Firestore access follows least privilege. Below are concrete, secure code examples for Django that integrate Firestore safely.

1. Disable external entities in XML parsing

When parsing XML retrieved from Firestore, use defusedxml or configure your parser to prohibit external entities. For ElementTree, avoid the default parser; use defusedxml.ElementTree, which disables external entity resolution by default.

import defusedxml.ElementTree as defused_et
from google.cloud import firestore

def parse_xml_from_firestore(doc_id: str):
    db = firestore.Client()
    doc = db.collection('xml_reports').document(doc_id).get()
    if not doc.exists:
        raise ValueError('Document not found')
    xml_data = doc.to_dict().get('content', '')
    # Safe parsing; external entities are not resolved
    root = defused_et.fromstring(xml_data)
    # Process the safe XML tree
    for item in root.findall('item'):
        print(item.text)

2. Secure lxml usage with explicit entity disabling

If you rely on lxml, explicitly disable DTD loading and external entities. Do not rely on default settings.

from lxml import etree
from google.cloud import firestore

def parse_xml_lxml_safe(doc_id: str):
    db = firestore.Client()
    doc = db.collection('xml_reports').document(doc_id).get()
    xml_content = doc.to_dict().get('content', '')
    parser = etree.XMLParser(resolve_entities=False, no_network=True, load_dtd=False)
    root = etree.fromstring(xml_content.encode('utf-8'), parser=parser)
    for node in root.xpath('//data'):
        print(node.text)

3. Firestore IAM and data validation

Apply least-privilege IAM roles to the Firestore service account used by Django. Avoid broad roles like owner; instead, grant only Datastore User for read-only access where possible. Validate and sanitize XML before storing it, rejecting external entity declarations and DOCTYPEs at upload time.

from google.cloud import firestore
from defusedxml import ElementTree as defused_et

def store_user_xml_safe(user_id: str, xml_content: str):
    # Basic validation to reject external entities before storage
    if '<!ENTITY' in xml_content or 'DOCTYPE' in xml_content.upper():
        raise ValueError('External entity declarations are not allowed')
    db = firestore.Client()
    db.collection('xml_uploads').document(user_id).set({'content': xml_content})

4. Middleware to reject risky XML in requests to Firestore

Add request validation in Django to prevent storing malicious XML in Firestore in the first place. This reduces persistence of malicious documents.

import re
from django.http import JsonResponse

XXE_PATTERN = re.compile(r'<!\s*DOCTYPE', re.IGNORECASE)

def validate_xml_middleware(get_response):
    def middleware(request):
        if request.method == 'POST' and request.content_type and 'xml' in request.content_type:
            body = request.body.decode('utf-8', errors='ignore')
            if XXE_PATTERN.search(body):
                return JsonResponse({'error': 'External entity declarations are not allowed'}, status=400)
        response = get_response(request)
        return response
    return middleware

5. Operational safeguards

Use middleBrick to scan your Django endpoints and Firestore integration paths. It can identify unsafe XML parsing patterns and insecure data handling in the unauthenticated attack surface, providing prioritized findings with severity and remediation guidance. Combine this with continuous monitoring on the Pro plan to detect regressions, and leverage the GitHub Action to fail CI/CD builds if risky configurations are detected.

Frequently Asked Questions

Does Firestore scan for XXE vulnerabilities directly?
No. Firestore is a managed NoSQL database and does not parse XML. XXE risk arises when Django retrieves XML from Firestore and processes it with an unsafe parser. Securing the parser and applying least-privilege IAM for Firestore access are required.
Can storing XML in Firestore be safe?
Yes, if you validate and sanitize uploads to reject external entity declarations and DOCTYPEs before storing, and always parse retrieved XML with secure settings (e.g., defusedxml or lxml with external entities disabled). Avoid storing untrusted XML when possible.