Unicode Normalization in Dynamodb
How Unicode Normalization Manifests in DynamoDB
Unicode normalization attacks in DynamoDB occur when applications fail to normalize user input before querying or storing data. DynamoDB stores strings as UTF-8 but doesn't normalize them during operations, creating subtle vulnerabilities that attackers can exploit.
The most common attack pattern involves bypassing authorization checks. Consider an application that stores user permissions in DynamoDB:
// Vulnerable DynamoDB query
const params = {
TableName: 'UserPermissions',
KeyConditionExpression: 'UserId = :userId AND Permission = :permission',
ExpressionAttributeValues: {
':userId': 'user123',
':permission': 'admin'
}
};
const result = await dynamodb.query(params).promise();An attacker can bypass this by submitting 'admin' with different Unicode representations. The string 'admin' can be encoded as:
// Normal form
const normalAdmin = 'admin';
// Precomposed form (U+0061 U+0064 U+006D U+0069 U+006E)
const precomposedAdmin = 'admin';
// Decomposed form (U+0061 U+0064 U+006D U+0069 U+006E)
const decomposedAdmin = 'admin';
// With combining characters
const combiningAdmin = 'äd̈m̈ïn̈'; // äd̈m̈ïn̈Since DynamoDB performs byte-level comparisons on UTF-8 strings, these different representations won't match the stored 'admin' permission, potentially allowing unauthorized access if the application logic assumes normalization.
Another attack vector involves DynamoDB's contains and begins_with operations. These operations are also sensitive to Unicode representation:
// Vulnerable search operation
const searchParams = {
TableName: 'Products',
FilterExpression: 'contains(Title, :searchTerm)',
ExpressionAttributeValues: { ':searchTerm': 'café' }
};
const searchResult = await dynamodb.scan(searchParams).promise();An attacker can search for 'café' (precomposed) while the database contains 'café' (decomposed), causing the search to fail and potentially bypassing content filtering or access controls.
Partition key collisions represent another serious issue. DynamoDB uses the raw partition key bytes for partitioning data across nodes. Different Unicode representations of the same logical string will hash to different partition keys:
// Two logically identical keys with different Unicode forms
const key1 = 'resume';
const key2 = 'resuḿe'; // resumé with combining acute accent
// These will be stored in different partitions
await dynamodb.put({
TableName: 'Documents',
Item: { UserId: key1, Content: '...' }
}).promise();
await dynamodb.put({
TableName: 'Documents',
Item: { UserId: key2, Content: '...' }
}).promise();This can cause data fragmentation, inconsistent query results, and broken application logic that assumes unique user identifiers.
DynamoDB-Specific Detection
Detecting Unicode normalization issues in DynamoDB requires both static analysis and runtime scanning. The middleBrick API security scanner includes specific checks for these vulnerabilities by testing how your DynamoDB endpoints handle Unicode variations.
middleBrick's DynamoDB scanning process:
# Scan a DynamoDB-backed API endpoint
middlebrick scan https://api.example.com/user/permissions
# Scan with JSON output for integration
middlebrick scan --output json https://api.example.com/search
# Continuous monitoring with GitHub Action
- uses: middleBrick/middleBrick@v1
with:
url: https://api.example.com/api-gateway
fail-on-severity: highmiddleBrick tests Unicode normalization by submitting requests with multiple Unicode representations of the same logical string and comparing DynamoDB's responses. It specifically checks:
- Authentication bypass attempts using Unicode variations of role names
- Search functionality inconsistencies with precomposed vs decomposed characters
- Partition key collisions by testing different Unicode forms of user identifiers
- Authorization logic failures when comparing permission strings
For manual detection, implement comprehensive test cases that exercise your DynamoDB queries with Unicode variations:
const unicodeTestVectors = [
{ name: 'precomposed', value: 'cafeé' },
{ name: 'decomposed', value: 'café' },
{ name: 'combined', value: 'äd̈m̈ïn̈' },
{ name: 'fullwidth', value: 'a、min' } // admin
];
async function testUnicodeNormalization() {
for (const test of unicodeTestVectors) {
const result = await dynamodb.query({
TableName: 'UserPermissions',
KeyConditionExpression: 'UserId = :userId AND Permission = :permission',
ExpressionAttributeValues: {
':userId': 'testuser',
':permission': test.value
}
}).promise();
console.log(`${test.name}: ${JSON.stringify(result.Items)}`);
}
}Monitor DynamoDB logs for unusual query patterns that might indicate Unicode-based attacks. Look for repeated queries with slight Unicode variations, especially targeting authentication or authorization endpoints.
DynamoDB-Specific Remediation
Remediating Unicode normalization issues in DynamoDB requires a defense-in-depth approach. The primary strategy is to normalize all user input before it reaches DynamoDB operations.
Using Node.js with the built-in unicodedata2 library:
const { normalize } = require('unicodedata2');
// Normalize to NFC (Canonical Decomposition, followed by Canonical Composition)
function normalizeInput(input) {
return normalize(input, 'NFC');
}
// Apply normalization to all DynamoDB operations
async function safeDynamoQuery(userId, permission) {
const normalizedUserId = normalizeInput(userId);
const normalizedPermission = normalizeInput(permission);
const params = {
TableName: 'UserPermissions',
KeyConditionExpression: 'UserId = :userId AND Permission = :permission',
ExpressionAttributeValues: {
':userId': normalizedUserId,
':permission': normalizedPermission
}
};
return await dynamodb.query(params).promise();
}
// For batch operations, normalize all items before writing
async function safeBatchWrite(items) {
const normalizedItems = items.map(item => ({r/> ...item,
UserId: normalizeInput(item.UserId),
Permission: normalizeInput(item.Permission)
}));
const params = {
RequestItems: {
'UserPermissions': normalizedItems.map(item => ({
PutRequest: { Item: item }
}))
}
};
return await dynamodb.batchWrite(params).promise();
}For Python applications using boto3:
import unicodedata
def normalize_input(input_str):
return unicodedata.normalize('NFC', input_str)
def safe_dynamo_query(user_id, permission):
normalized_user_id = normalize_input(user_id)
normalized_permission = normalize_input(permission)
params = {
'TableName': 'UserPermissions',
'KeyConditionExpression': 'UserId = :userId AND Permission = :permission',
'ExpressionAttributeValues': {
':userId': normalized_user_id,
':permission': normalized_permission
}
}
return dynamodb.query(**params)Implement DynamoDB Global Secondary Indexes (GSIs) with normalized attributes for search operations:
// Create GSI with normalized attribute for search
await dynamodb.createTable({
TableName: 'Products',
AttributeDefinitions: [
{ AttributeName: 'ProductId', AttributeType: 'S' },
{ AttributeName: 'NormalizedTitle', AttributeType: 'S' }
],
KeySchema: [
{ AttributeName: 'ProductId', KeyType: 'HASH' }
],
GlobalSecondaryIndexes: [{
IndexName: 'TitleSearchIndex',
KeySchema: [{ AttributeName: 'NormalizedTitle', KeyType: 'HASH' }],
Projection: { ProjectionType: 'ALL' }
}]
}).promise();
// Store normalized version for search
async function createProduct(product) {
const normalizedTitle = normalizeInput(product.Title);
await dynamodb.put({
TableName: 'Products',
Item: {
ProductId: product.ProductId,
Title: product.Title,
NormalizedTitle: normalizedTitle
}
}).promise();
}Add validation middleware to reject suspicious Unicode patterns:
function validateUnicode(input) {
// Reject inputs with combining characters if not expected
const hasCombiningChars = /[̀-ͯ]/u.test(input);
if (hasCombiningChars) {
throw new Error('Invalid character encoding');
}
return true;
}Finally, implement comprehensive logging and monitoring to detect when Unicode normalization attacks are attempted, allowing you to respond to emerging threats.