Unicode Normalization in Express with Dynamodb
Unicode Normalization in Express with Dynamodb — how this specific combination creates or exposes the vulnerability
Unicode normalization inconsistencies between Express request handling and Amazon DynamoDB key comparisons can lead to authentication bypass or IDOR-style access control issues. When an Express app accepts user input (e.g., username or API key) and directly queries DynamoDB without normalizing to a canonical form, visually equivalent strings with different binary representations may match separate DynamoDB items or fail to match the intended item in unpredictable ways.
For example, the character é can be represented as a single code point U+00E9 (LATIN SMALL LETTER E WITH ACUTE) or as a decomposed sequence e + U+0301 COMBINING ACUTE ACCENT. If the client sends the decomposed form but DynamoDB stores or indexes the precomposed form (or vice versa), a lookup based on string equality may return no item or, worse, return an item belonging to another user if secondary indexes or filter expressions are involved. This class of issue maps to BOLA/IDOR when one user can substitute their normalized identifier for another user’s normalized identifier and the app fails to enforce strict normalization before access checks.
DynamoDB’s native string comparisons are binary and do not perform Unicode normalization. Therefore, it is the application’s responsibility to normalize inputs consistently across endpoints and persistence layers. In Express, if route parameters, query strings, or JSON payloads are used to construct DynamoDB KeyConditionExpression or FilterExpression values without normalization, the effective access boundary may diverge from the developer’s intent. Attack vectors include crafted payloads that exploit normalization differences to access other users’ resources, bypass authentication checks, or manipulate secondary index keys where sort keys involve user-controlled strings. These patterns are relevant to the 12 security checks run by middleBrick, particularly BOLA/IDOR, Input Validation, and Property Authorization.
Consider an Express endpoint that retrieves a user profile by userId stored in DynamoDB. If the client can supply the userId in multiple Unicode forms, normalization must be applied before constructing the request to DynamoDB. Failing to do so can result in inconsistent authorization decisions. middleBrick’s Unicode Normalization checks within its 12 parallel security checks can surface such inconsistencies by correlating spec definitions (OpenAPI/Swagger with full $ref resolution) with runtime behavior, highlighting missing canonicalization in request handling.
To illustrate the risk, imagine an Express route where the path parameter is used directly in a DynamoDB query:
app.get('/api/profile/:userId', async (req, res) => {
const { userId } = req.params;
const params = {
TableName: 'Users',
Key: {
userId: userId
}
};
const data = await dynamodb.get(params).promise();
res.json(data.Item);
});
If the client sends /api/profile/résumé with a decomposed é while the item in DynamoDB uses the precomposed é, the get returns null, potentially causing the app to treat this as a missing resource or to fall back to another lookup path, which may inadvertently expose a different item depending on downstream logic. middleBrick’s scans include checks aligned with the OWASP API Top 10 and can surface these classes of flaws by comparing spec-defined parameter constraints with observed behavior.
Dynamodb-Specific Remediation in Express — concrete code fixes
Remediation centers on normalizing all user-controlled strings to a canonical Unicode form before any DynamoDB interaction in Express. Use a consistent normalization form such as NFC (most common for storage and lookup) across the entire stack, including client inputs, DynamoDB keys, and any strings used in expressions. Validate and normalize early in middleware to ensure downstream logic operates on canonical values.
Below are concrete Express + DynamoDB code examples showing insecure patterns and their secure counterparts.
Insecure example (vulnerable to Unicode confusion)
const express = require('express');
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
app.get('/api/user/:username', async (req, res) => {
const { username } = req.params;
const params = {
TableName: 'Accounts',
Key: {
username: username
}
};
const data = await dynamodb.get(params).promise();
if (!data.Item) return res.status(404).send('Not found');
res.json(data.Item);
});
In this example, username normalization is absent. A request with a decomposed username may not match a stored precomposed username in DynamoDB, leading to false negatives or, under certain index/filter setups, unintended access.
Secure remediation with normalization
const express = require('express');
const AWS = require('aws-sdk');
const { normalize } = require('unorm'); // Example library for Unicode normalization
const dynamodb = new AWS.DynamoDB.DocumentClient();
app.get('/api/user/:username', async (req, res) => {
const { username } = req.params;
const normalizedUsername = normalize('NFC', username);
const params = {
TableName: 'Accounts',
Key: {
username: normalizedUsername
}
};
const data = await dynamodb.get(params).promise();
if (!data.Item) return res.status(404).send('Not found');
res.json(data.Item);
});
By normalizing the incoming username to NFC before constructing the DynamoDB Key, you guarantee consistent matching against items stored in canonical form. Apply the same normalization to any string used in KeyConditionExpression, FilterExpression, or attribute comparisons.
For broader protection in Express, add a normalization middleware that processes relevant fields (path parameters, query strings, and JSON body fields) before routing logic executes. For example:
app.use((req, res, next) => {
const normalizeFields = (obj) => {
if (obj && typeof obj === 'object') {
for (const key of Object.keys(obj)) {
if (typeof obj[key] === 'string') {
obj[key] = normalize('NFC', obj[key]);
} else if (typeof obj[key] === 'object') {
normalizeFields(obj[key]);
}
}
}
};
normalizeFields(req.params);
normalizeFields(req.query);
normalizeFields(req.body);
next();
});
With this middleware, all incoming strings are normalized, reducing the risk of inconsistent lookups across DynamoDB operations. Combine this practice with middleBrick’s Pro continuous monitoring and CI/CD integration (GitHub Action) to detect regressions where normalization is missing or inadvertently bypassed in new routes.
Additional guidance: ensure any secondary indexes or Global Secondary Indexes (GSIs) that include user-controlled strings are designed with normalization in mind, as index keys are also subject to binary comparison in DynamoDB. If you use middleBrick’s MCP Server in your IDE, you can scan API definitions and DynamoDB access patterns during development to surface missing canonicalization before deployment.