Xpath Injection in Fastapi with Dynamodb
Xpath Injection in Fastapi with Dynamodb — how this specific combination creates or exposes the vulnerability
XPath Injection occurs when untrusted input is concatenated into an XPath expression without proper sanitization or parameterization. Although DynamoDB itself does not use XPath, many FastAPI services accept XML or SOAP payloads, convert them into XPath queries, and then use DynamoDB as the backend data store. This combination exposes a classic injection vector: user-controlled data flows from the HTTP request into an XPath string and is later used to query DynamoDB via an SDK or ORM layer that interprets the constructed query.
Consider a FastAPI endpoint that receives an XML document containing a user identifier, extracts a value via XPath, and uses that value to retrieve an item from DynamoDB. If the XPath expression is built by string concatenation, an attacker can inject additional predicates or path components. For example, an input like ' or 1=1 or ' could extend the XPath to select unintended nodes. Even though DynamoDB does not natively evaluate XPath, the intermediate layer that translates XPath into a DynamoDB query (such as a custom mapper or an XML-to-DynamoDB bridge) may produce a request with unintended filter expressions or condition checks. This can lead to unauthorized data access or data modification, aligning with the BOLA/IDOR and Property Authorization checks in middleBrick’s security assessment.
In practice, this vulnerability manifests when developers use libraries that allow dynamic XPath construction without context-aware escaping. middleBrick’s LLM/AI Security checks include active prompt injection testing and system prompt leakage detection, but for API-level XPath risks, the scanner’s 12 parallel checks—particularly Input Validation and Property Authorization—flag unsafe string handling and over-permissive data exposure. A scan can identify endpoints where raw input reaches XPath logic and trace how the resulting query interacts with DynamoDB requests, highlighting paths that could enable unauthorized reads or writes.
To illustrate, a vulnerable FastAPI route might look like this:
from fastapi import FastAPI, Request
import boto3
from lxml import etree
app = FastAPI()
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('Users')
@app.post('/lookup')
async def lookup_user(request: Request):
body = await request.body()
root = etree.fromstring(body)
# Unsafe: concatenating user input into XPath
username = root.xpath('string(/user/username)')
# The string 'admin' could be altered via injection
response = table.get_item(Key={'username': username})
return response.get('Item', {})
An attacker sending a crafted XML body could manipulate the resolved username, leading to retrieval of unintended items. middleBrick’s OpenAPI/Swagger analysis, with full $ref resolution, would cross-reference this runtime behavior against the spec and flag the absence of input validation on the XPath-derived parameter.
Dynamodb-Specific Remediation in Fastapi — concrete code fixes
Remediation focuses on avoiding string-based XPath construction and enforcing strict input validation before any DynamoDB interaction. Use parameterized XPath evaluation where possible, or avoid XPath altogether by parsing XML with secure, schema-bound methods. Ensure that values extracted from XML are validated against allowlists and never directly concatenated into queries.
Below is a secure FastAPI example that avoids XPath injection and safely interfaces with DynamoDB:
from fastapi import FastAPI, HTTPException
import boto3
from pydantic import BaseModel
from typing import Optional
app = FastAPI()
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('Users')
class UserLookup(BaseModel):
username: str
# Allowlist for usernames: only alphanumeric and underscore
import re
USERNAME_PATTERN = re.compile(r'^[a-zA-Z0-9_]{3,30}$')
def is_valid_username(value: str) -> bool:
return bool(USERNAME_PATTERN.match(value))
@app.post('/lookup')
async def lookup_user(payload: UserLookup):
username = payload.username
if not is_valid_username(username):
raise HTTPException(status_code=400, detail='Invalid username format')
# Safe: using parameterized GetItem with validated input
response = table.get_item(Key={'username': username})
item = response.get('Item')
if not item:
raise HTTPException(status_code=404, detail='User not found')
return item
If XML input is required, use a schema-validating parser (e.g., lxml with an XSD) and extract only specific, expected fields rather than evaluating dynamic expressions:
from fastapi import FastAPI, Request
from lxml import etree
import boto3
from io import BytesIO
app = FastAPI()
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('Users')
# Load a schema to validate structure and expected values
with open('user_schema.xsd', 'rb') as f:
schema_doc = etree.parse(f)
schema = etree.XMLSchema(schema_doc)
@app.post('/upload')
async def upload_user(request: Request):
body = await request.body()
doc = etree.parse(BytesIO(body))
# Validate against XSD instead of extracting via dynamic XPath
if not schema.validate(doc):
raise HTTPException(status_code=400, detail='Invalid XML schema')
# Extract only known safe paths
username_nodes = doc.xpath('//username', namespaces=doc.namespaces)
if not username_nodes:
raise HTTPException(status_code=400, detail='Missing username')
username = username_nodes[0].text
if not is_valid_username(username):
raise HTTPException(status_code=400, detail='Invalid username')
table.put_item(Item={'username': username, 'email': doc.xpath('//email/text()')[0]})
return {'status': 'ok'}
These practices align with the remediation guidance provided in middleBrick’s findings, which include prioritized steps and severity-aware recommendations. For ongoing protection, the Pro plan’s continuous monitoring and GitHub Action integration can enforce that no endpoint accepts unvalidated input before it reaches DynamoDB, while the CLI tool allows quick scans from the terminal using middlebrick scan <url>. Developers can also run scans directly from their IDE via the MCP Server to catch XPath and DynamoDB interaction issues during development.