Xml External Entities in Mongodb
How Xml External Entities Manifests in Mongodb
Xml External Entities (XXE) attacks in MongoDB contexts typically occur when XML data flows through MongoDB's document processing pipelines. While MongoDB stores data in BSON format rather than XML, XXE vulnerabilities emerge when applications accept XML input that gets processed before storage or after retrieval.
The most common MongoDB XXE scenario involves applications that accept XML configuration files or user-provided XML data that gets parsed by a vulnerable XML parser before being stored in MongoDB. For example, a document management system might accept XML metadata for documents stored in MongoDB:
// Vulnerable XML parsing before MongoDB storage
const xml2js = require('xml2js');
const { MongoClient } = require('mongodb');
async function storeDocumentMetadata(xmlData) {
const parser = new xml2js.Parser({
// Dangerous: allows external entities by default
xmlns: true,
explicitRoot: false
});
const parsedData = await parser.parseStringPromise(xmlData);
const client = await MongoClient.connect('mongodb://localhost:27017');
const db = client.db('documents');
await db.collection('metadata').insertOne({
xmlContent: parsedData,
timestamp: new Date()
});
await client.close();
}In this vulnerable pattern, an attacker could craft XML containing external entity declarations:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
<content>&xxe;</content>
</root>When this XML is parsed by the vulnerable xml2js parser, it would read the /etc/passwd file contents and store them in MongoDB. The attack extends to SSRF scenarios where external entities point to internal services:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "http://internal-service:8080/secret">
]>
<root>
<content>&xxe;</content>
</root>Another MongoDB-specific XXE vector occurs in applications using XML-based data import/export with MongoDB. Consider a migration tool that processes XML dumps:
import xml.etree.ElementTree as ET
import pymongo
def import_xml_to_mongodb(xml_file, db_name):
tree = ET.parse(xml_file) # Vulnerable: no entity resolution disabled
root = tree.getroot()
client = pymongo.MongoClient('mongodb://localhost:27017')
db = client[db_name]
for record in root.findall('record'):
data = {child.tag: child.text for child in record}
db.collection('imported').insert_one(data)
client.close()This code uses Python's default XML parser without disabling entity expansion, making it vulnerable to XXE attacks during import operations.
Mongodb-Specific Detection
Detecting XXE vulnerabilities in MongoDB-integrated applications requires examining both the XML processing code and the data flow patterns. The first detection step is identifying XML parsing operations that occur before or after MongoDB interactions.
Static code analysis should flag these dangerous patterns:
# Search for vulnerable XML parsing patterns
# Node.js with xml2js (dangerous default settings)
grep -r "xml2js" . --include="*.js" | grep -v "-noent"
# Python with xml.etree.ElementTree (default vulnerable)
grep -r "ElementTree" . --include="*.py" | grep -v "resolve_entities"
# Java with DocumentBuilderFactory (needs secure processing)
grep -r "DocumentBuilderFactory" . --include="*.java" | grep -v "setFeature"Runtime detection focuses on identifying XML processing in application logs and monitoring for unusual patterns. Tools like middleBrick can scan API endpoints that accept XML data, testing for XXE vulnerabilities by attempting controlled external entity injections.
middleBrick's XXE detection methodology includes:
- Testing XML endpoints with crafted payloads containing external entities
- Monitoring for successful entity resolution or error messages that reveal parser behavior
- Checking for SSRF-like behavior when external entities point to internal services
- Analyzing response times that might indicate file access attempts
For MongoDB-specific detection, examine the application's data ingestion pipelines. Look for:
// Dangerous: no XML validation before storage
app.post('/upload-xml', async (req, res) => {
const xmlData = req.body.xml;
// No validation, no sanitization
await collection.insertOne({ xml: xmlData });
});Network monitoring can also detect XXE attempts by observing outbound connections from your application servers when processing XML data. Unusual DNS queries or HTTP requests to unexpected destinations may indicate successful XXE exploitation.
Database-level detection involves monitoring MongoDB collections for suspicious data patterns. XXE attacks often result in unexpected data structures or content that deviates from normal application behavior. Implement alerting on:
- Large insertions of XML data from single sources
- Unexpected file path patterns in stored data
- Anomalous access patterns to MongoDB collections that handle XML data
Mongodb-Specific Remediation
Remediating XXE vulnerabilities in MongoDB-integrated applications requires securing XML processing before data reaches the database. The most effective approach is disabling external entity processing at the parser level.
For Node.js applications using xml2js:
const { MongoClient } = require('mongodb');
async function secureStoreDocumentMetadata(xmlData) {
const parser = new xml2js.Parser({
// Secure configuration
xmlns: false,
explicitRoot: false,
// Explicitly disable external entities
strict: true,
// Additional security options
mergeAttrs: true,
explicitArray: false
});
try {
const parsedData = await parser.parseStringPromise(xmlData);
const client = await MongoClient.connect('mongodb://localhost:27017');
const db = client.db('documents');
await db.collection('metadata').insertOne({
xmlContent: parsedData,
timestamp: new Date()
});
await client.close();
return { success: true };
} catch (error) {
// Handle parsing errors securely
return { success: false, error: 'Invalid XML format' };
}
}For Python applications using xml.etree.ElementTree:
import xml.etree.ElementTree as ET
import pymongo
from defusedxml.ElementTree import parse
def secure_import_xml_to_mongodb(xml_file, db_name):
# Use defusedxml to prevent XXE
tree = parse(xml_file)
root = tree.getroot()
client = pymongo.MongoClient('mongodb://localhost:27017')
db = client[db_name]
for record in root.findall('record'):
data = {child.tag: child.text for child in record}
db.collection('imported').insert_one(data)
client.close()The defusedxml library provides hardened XML parsers that disable external entity resolution by default. This is the recommended approach for Python applications.
For Java applications using DocumentBuilderFactory:
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoClients;
public class SecureXmlProcessor {
public void processAndStoreXml(String xmlContent) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Secure configuration
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xmlContent)));
// Process document and store in MongoDB
MongoClient mongoClient = MongoClients.create("mongodb://localhost:27017");
MongoDatabase database = mongoClient.getDatabase("documents");
MongoCollection collection = database.getCollection("metadata");
// Convert DOM to MongoDB document and insert
// ... processing logic ...
mongoClient.close();
}
} Additional MongoDB-specific security measures include implementing input validation schemas for XML data before parsing, using content security policies to restrict external resource loading, and implementing rate limiting on XML processing endpoints to mitigate automated XXE attacks.
For applications that must process XML from trusted sources, consider XML schema validation to ensure only expected structures are processed, combined with strict content sanitization before storage in MongoDB.