Unicode Normalization with Bearer Tokens
How Unicode Normalization Manifests in Bearer Tokens
Unicode normalization vulnerabilities in Bearer tokens occur when authentication systems fail to consistently handle Unicode characters across different normalization forms. When a server accepts a token but processes it in a different Unicode normalization form than the client submitted, attackers can exploit this inconsistency to bypass authentication or access unauthorized resources.
The most common attack pattern involves submitting a token with Unicode characters that have multiple valid representations. For example, the character 'é' can be represented as U+00E9 (Latin small letter e with acute) or as U+0065 U+0301 (e followed by combining acute accent). If an authentication system normalizes tokens inconsistently, an attacker might craft a token that validates as legitimate when normalized one way but fails when normalized another way.
Consider this attack scenario: An API accepts a Bearer token containing Unicode characters. The client submits the token as-is, but the server normalizes it to NFC (Canonical Decomposition followed by Canonical Composition) before validation. An attacker who knows the original token can submit a version in NFD (Canonical Decomposition) form. If the server's comparison logic is case-insensitive or uses loose string matching, the NFD version might pass validation while being structurally different from the original token.
// Vulnerable comparison logic
function validateBearerToken(providedToken, storedToken) {
// Normalizes provided token but not stored token
const normalizedProvided = providedToken.normalize('NFC');
return normalizedProvided === storedToken; // storedToken might be in different form
}
// Attack: Original token: "eyJ0eXAiOiJKV1QiLC..."
// Attacker submits: "éýJ0..." (NFD form)
// If server normalizes to NFC, both may compare equal incorrectlyAnother manifestation occurs in token generation systems. If a token generation service produces tokens with Unicode characters but doesn't specify the normalization form, different instances might generate tokens in different forms. This creates a situation where a token valid on one server instance fails on another, leading to inconsistent authentication behavior across a distributed system.
Property-based authorization checks are particularly vulnerable. If a Bearer token contains Unicode characters representing user IDs or role identifiers, inconsistent normalization can cause authorization decisions to be bypassed. An attacker might craft a token where the normalized form grants elevated privileges while the original form appears legitimate.
Bearer Tokens-Specific Detection
Detecting Unicode normalization vulnerabilities in Bearer tokens requires systematic testing across different normalization forms. The most effective approach is to scan API endpoints that accept Bearer tokens with tokens modified to different Unicode normalization forms.
middleBrick's Bearer tokens security scanning specifically tests this vulnerability by submitting the same token in multiple normalization forms and observing how the server responds. The scanner sends tokens in NFC, NFD, NFKC, and NFKD forms, then analyzes whether the server treats them differently.
// middleBrick scan output example
{
"unicode_normalization": {
"status": "vulnerable",
"description": "Server accepts Bearer tokens in multiple Unicode normalization forms without consistent validation",
"severity": "high",
"remediation": "Implement consistent Unicode normalization using NFC form before token validation",
"evidence": [
{
"submitted_form": "NFC",
"response_code": 200,
"response_time": 123
},
{
"submitted_form": "NFD",
"response_code": 200,
"response_time": 125
}
]
}
}Manual detection involves creating test tokens with Unicode characters and submitting them in different normalization forms. Use tools like Python's unicodedata module or Node.js's String.prototype.normalize() to generate different forms of the same token.
// Manual testing script
import requests
import unicodedata
def test_unicode_normalization(base_url, token):
forms = {
'NFC': unicodedata.normalize('NFC', token),
'NFD': unicodedata.normalize('NFD', token),
'NFKC': unicodedata.normalize('NFKC', token),
'NFKD': unicodedata.normalize('NFKD', token)
}
results = {}
for form, token_form in forms.items():
headers = {'Authorization': f'Bearer {token_form}'}
response = requests.get(f'{base_url}/protected', headers=headers)
results[form] = {
'status_code': response.status_code,
'content': response.text[:200] # truncate for brevity
}
return results
# Test with a token containing Unicode characters
original_token = "eyJ0eXAiOiJKV1QiLC...\u00e9..."
results = test_unicode_normalization('https://api.example.com', original_token)
print(results)Look for inconsistent behavior across normalization forms. If the server returns different responses (200 vs 401, or different error messages) for the same logical token in different forms, you've identified a normalization vulnerability. Pay special attention to timing differences, as some implementations might normalize on-the-fly, causing subtle timing variations that leak information about the token's structure.
Bearer Tokens-Specific Remediation
Remediating Unicode normalization vulnerabilities in Bearer tokens requires implementing consistent normalization across the entire authentication and authorization pipeline. The key principle is to normalize all tokens to a single, well-defined form before any validation or processing occurs.
The most secure approach is to normalize all incoming Bearer tokens to NFC (Canonical Decomposition followed by Canonical Composition) form immediately upon receipt. This ensures consistent processing regardless of how the client submitted the token.
// Secure Bearer token validation middleware
function bearerTokenMiddleware(req, res, next) {
const authHeader = req.headers.authorization;
if (!authHeader || !authHeader.startsWith('Bearer ')) {
return res.status(401).json({ error: 'Missing Bearer token' });
}
// Extract and normalize token
let token = authHeader.substring(7); // remove 'Bearer '
token = normalizeBearerToken(token);
// Validate normalized token
if (!validateToken(token)) {
return res.status(401).json({ error: 'Invalid token' });
}
req.token = token;
next();
}
function normalizeBearerToken(token) {
// Normalize to NFC form - most compatible and widely accepted
return token.normalize('NFC');
}
function validateToken(token) {
// Your validation logic here
// Always operates on normalized token
return tokenService.verify(token);
}For token generation systems, ensure all tokens are generated in a specific normalization form. If your token format allows Unicode characters (which is rare for JWTs but possible in custom token formats), generate them in NFC form and document this requirement for all client implementations.
// Token generation with consistent normalization
function generateSecureBearerToken(payload) {
const header = { alg: 'HS256', typ: 'JWT' };
const encodedHeader = base64url.encode(JSON.stringify(header));
// Ensure payload is normalized before encoding
const normalizedPayload = normalizePayload(payload);
const encodedPayload = base64url.encode(JSON.stringify(normalizedPayload));
const signature = crypto
.createHmac('sha256', process.env.JWT_SECRET)
.update(`${encodedHeader}.${encodedPayload}`)
.digest('base64');
return `${encodedHeader}.${encodedPayload}.${signature}`;
}
function normalizePayload(payload) {
// Normalize all string properties in payload
const normalized = {};
for (const [key, value] of Object.entries(payload)) {
normalized[key] = typeof value === 'string'
? value.normalize('NFC')
: value;
}
return normalized;
}When storing tokens in databases or caches, normalize them before storage and always compare using constant-time comparison functions to prevent timing attacks. Many Unicode characters can have visually similar representations that differ in byte length, making length-based timing attacks feasible if not properly mitigated.
// Secure token comparison
function constantTimeCompare(val1, val2) {
if (val1.length !== val2.length) return false;
let result = 0;
for (let i = 0; i < val1.length; i++) {
result |= val1.charCodeAt(i) ^ val2.charCodeAt(i);
}
return result === 0;
}
// Usage in validation
const normalizedProvided = providedToken.normalize('NFC');
const normalizedStored = storedToken.normalize('NFC');
if (!constantTimeCompare(normalizedProvided, normalizedStored)) {
return res.status(401).json({ error: 'Invalid token' });
}Implement comprehensive testing with tokens containing various Unicode characters, including those from different scripts, combining characters, and characters with multiple valid representations. Test your system's behavior with tokens in all four Unicode normalization forms to ensure consistent handling.