Unicode Normalization in Fastapi with Dynamodb
Unicode Normalization in Fastapi with Dynamodb — how this specific combination creates or exposes the vulnerability
Unicode normalization inconsistencies between FastAPI request handling and DynamoDB storage can lead to authentication bypass, data exposure, and IDOR-like behavior. When a FastAPI service accepts user input (e.g., usernames, identifiers, or file paths) and stores or queries it against DynamoDB without normalizing to a canonical form, semantically equivalent strings that differ in binary representation are treated as distinct keys. For example, the character é can be represented as a single code point U+00E9 (LATIN SMALL LETTER E WITH ACUTE) or as a decomposed sequence U+0065 U+0301 (e followed by combining acute accent). If a client submits the decomposed form and the application stores it as-is in DynamoDB, queries using the precomposed form will not match, enabling account takeover or unauthorized access when combined with weak authentication checks.
In DynamoDB, primary keys are used for exact-match lookups. If partition or sort keys are compared without normalization, two logically identical identifiers may map to different items, bypassing intended access controls. This becomes critical in FastAPI endpoints that derive keys directly from user-supplied data (e.g., path parameters or headers) used in DynamoDB GetItem or Query operations. An attacker can exploit this by registering with a normalized username and then authenticating with a variant that maps to the same logical identity but bypasses equality checks. The issue is compounded when FastAPI route definitions or middleware do not enforce normalization before constructing DynamoDB keys, leaving the API surface open to subtle IDOR conditions even when endpoints appear to enforce ownership checks.
The interaction with DynamoDB’s schema-less design amplifies the risk: string attributes are stored as-is, and secondary indexes (like Global Secondary Indices) rely on the stored values. If normalization is applied inconsistently across write and read paths, queries against GSIs can return incomplete or unexpected results, enabling data exposure through enumeration. For example, a FastAPI endpoint querying a GSI on email might miss records if the stored email uses a different normalization form than the query parameter. This can lead to information leakage or authentication failures that are difficult to trace. MiddleBrick’s LLM/AI Security checks and input validation scans can detect normalization-related inconsistencies by analyzing endpoint behavior and spec-to-runtime mappings, highlighting mismatches that could lead to OWASP API Top 10 violations such as Broken Object Level Authorization.
Dynamodb-Specific Remediation in Fastapi — concrete code fixes
Remediation centers on normalizing Unicode input before any DynamoDB operation in FastAPI, using a consistent canonical form such as NFC or NFD. Apply normalization at the boundary where user input enters the application (e.g., in path parameters, headers, or body fields) and ensure the same normalization is used for all DynamoDB key construction and queries. Below are concrete code examples for a Fastapi service that stores and retrieves user profiles using DynamoDB with proper normalization.
from fastapi import FastAPI, HTTPException, Depends, Header
import unicodedata
import boto3
from pydantic import BaseModel
app = FastAPI()
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('users')
class UserProfile(BaseModel):
user_id: str # normalized key
email: str # should also be normalized if used as a key or indexed
def normalize_unicode(value: str) -> str:
return unicodedata.normalize('NFC', value)
@app.post('/profiles/')
def create_profile(profile: UserProfile, authorization: str = Header(None)):
normalized_user_id = normalize_unicode(profile.user_id)
normalized_email = normalize_unicode(profile.email)
# Use normalized keys for DynamoDB operations
table.put_item(Item={
'user_id': normalized_user_id,
'email': normalized_email,
'data': profile.dict()
})
return {'status': 'created', 'user_id': normalized_user_id}
@app.get('/profiles/{user_id}')
def get_profile(user_id: str, authorization: str = Header(None)):
normalized_user_id = normalize_unicode(user_id)
response = table.get_item(Key={'user_id': normalized_user_id})
item = response.get('Item')
if not item:
raise HTTPException(status_code=404, detail='Profile not found')
return item
@app.get('/profiles/')
def list_profiles(email: str):
normalized_email = normalize_unicode(email)
# If using GSI on email, ensure queries use normalized value
response = table.query(
IndexName='email-index',
KeyConditionExpression=boto3.dynamodb.conditions.Key('email').eq(normalized_email)
)
return response.get('Items', [])
For automated protection in FastAPI, add a request preprocessing middleware that normalizes relevant fields before they reach route handlers. This ensures consistent handling across all endpoints and reduces the risk of missing normalization in individual functions. Combine this with DynamoDB client-side validation to reject keys that contain disallowed patterns or excessive combining characters, aligning with input validation checks that MiddleBrick’s scans monitor. If you use the middleBrick CLI (middlebrick scan <url>) or GitHub Action, normalization-related input validation findings will be surfaced with severity and remediation guidance mapped to frameworks like OWASP API Top 10 and PCI-DSS. Continuous monitoring plans in the Pro tier can schedule regular scans to catch regressions when endpoints evolve.
When integrating with CI/CD, the middleBrick GitHub Action can fail builds if normalization inconsistencies are detected during scans of staging APIs, preventing deployment of configurations that could lead to authorization flaws. For AI coding assistance, the MCP Server can surface normalization risks directly in the editor, helping developers maintain canonical forms during key construction. These integrations do not alter runtime behavior but provide feedback to reduce the likelihood of introducing Unicode-related authorization or data exposure issues.