Unicode Normalization on Cloudflare
How Unicode Normalization Manifests in Cloudflare — specific attack patterns, Cloudflare-specific code paths where this appears
Unicode normalization attacks exploit different byte representations of visually equivalent characters. In Cloudflare Workers and Pages environments, this can bypass validation that operates on pre-normalized strings while the runtime or upstream origin interprets a different canonical form. Common patterns include NFC/NFD mismatches where an attacker registers a domain or constructs a URL with a decomposed character (e.g., LATIN SMALL LETTER A WITH ACUTE as separate base + combining mark) that appears identical in the UI but has a distinct code-point sequence. If Cloudflare Workers access request.url or header values without normalizing, and then perform string comparisons (e.g., host-based routing or allowlists), a mismatched representation can lead to host confusion or path traversal-like behavior in routing logic.
Within Cloudflare’s own processing, normalization issues can surface in edge logic that handles internationalized domain names (IDNs) or in Workers that introspect new URL(request.url) and compare against a hardcoded list. For example, a Worker checking if (url.host === 'example.com') may fail to match éxample.com (U+00E9) versus éxample.com (e + combining acute), potentially allowing an attacker to reach an unintended route. Similarly, Workers KV lookups or Durable Object identifiers that incorporate user-controlled strings can diverge if normalization is inconsistent between write and read paths, enabling confusion that may be leveraged for privilege confusion or unauthorized access patterns analogous to BOLA/IDOR in edge logic.
Attack patterns specific to Cloudflare include:
- IDN homograph abuse where a Worker’s routing logic does not normalize Unicode before matching against an allowlist of domains.
- Header values (e.g., custom authentication tokens or identifiers) that include combining characters, causing string comparisons in Workers to diverge from what Cloudflare’s dashboard or logs display.
- Path-based routing in Workers Sites or Pages that does not canonicalize Unicode, allowing a request for
/caf%C3%A9(precomposed) to be treated differently from/café(decomposed) by origin checks.
These issues do not imply a vulnerability in Cloudflare’s infrastructure per se, but rather highlight how Unicode inconsistencies in application-level edge code can be leveraged in a Cloudflare-hosted deployment. Because Workers run close to the edge, the impact can be immediate and observable without requiring complex infrastructure manipulation.
Cloudflare-Specific Detection — how to identify this issue, including scanning with middleBrick
Detecting Unicode normalization issues in Cloudflare requires testing string comparisons and routing behavior rather than inspecting infrastructure. Use middleBrick to scan the edge endpoint (e.g., a Worker URL or Pages site) as an unauthenticated black-box target. The scanner’s input validation and IDOR/BOLA checks include normalization-sensitive tests that send equivalent Unicode representations of identifiers and observe whether routing or data access diverges. For example, a probe may submit a resource identifier using a decomposed form and a precomposed form, checking whether access control or data isolation differs.
To detect these issues manually while scanning with middleBrick, include test vectors in your suite:
- Send requests with IDN hostnames using both NFC and NFD forms and compare responses or observed redirects.
- Include combining characters in path segments or header values (e.g., authentication tokens or custom headers) and check whether string equality checks in Workers produce different outcomes.
- Verify that any normalization logic is applied consistently: if you normalize on write (e.g., when storing identifiers in Workers KV), ensure reads also normalize before comparison.
Example curl-based checks you can perform alongside a middleBrick scan:
# Precomposed: Latin small letter e with acute (U+00E9)
# Decomposed: Latin small letter e (U+0065) + combining acute accent (U+0301)
HOST_PRE=\"café.example.com\"
HOST_DECOMP=\"cafe\u0301.example.com\"
curl -H "Host: $HOST_PRE" https://your-worker-url.workers.dev/account
curl -H "Host: $HOST_DECOMP" https://your-worker-url.workers.dev/account
When you use the middleBrick CLI, run a targeted scan to surface validation and access control findings that may indicate normalization gaps:
middlebrick scan https://your-worker-url.workers.dev
The dashboard and JSON report from middleBrick will highlight findings under Input Validation and BOLA/IDOR categories that are relevant to inconsistent handling of Unicode representations, helping you prioritize remediation specific to your Cloudflare edge logic.
Cloudflare-Specific Remediation — code fixes using Cloudflare's native features/libraries
Remediation in Cloudflare environments centers on normalizing inputs before comparisons, routing, and storage. Use standard Unicode normalization libraries available in JavaScript (e.g., String.prototype.normalize) to ensure consistent forms. Apply normalization at the edge in Workers and in any origin logic that interacts with Cloudflare-stored identifiers.
Example Worker code that normalizes both the incoming host and stored identifiers before routing:
addEventListener('fetch', event => {
event.respondWith(handleRequest(event))
})
async function handleRequest(event) {
const url = new URL(event.request.url)
// Normalize host to NFC to ensure consistent matching
const host = url.host.normalize('NFC')
const allowedHost = 'example.com'.normalize('NFC')
if (host !== allowedHost) {
return new Response('Forbidden', { status: 403 })
}
// Normalize any user-supplied path segment used for KV lookup
const path = url.pathname
.split('/')
.map(segment => segment.normalize('NFC'))
.join('/')
const value = await MY_KV.get(path)
return new Response(value || 'Not found', { status: value ? 200 : 404 })
}
For Cloudflare Pages, normalize data before writing to Durable Objects or KV:
export class UserObject {
constructor(public id: string) {}
static normalizeId(input: string): string {
return input.normalize('NFC')
}
}
export class UserDO extends DurableObject {
fetch(request: Request) {
const url = new URL(request.url)
const normalizedId = UserObject.normalizeId(url.pathname.slice(1))
// Use normalizedId for consistent storage/retrieval
return new Response(this.state.storage.get(normalizedId) || 'Missing')
}
}
Additionally, enforce consistent normalization for header values used in authentication or session handling:
function getAuthToken(request: Request): string | null {
const tokenHeader = request.headers.get('X-Api-Token')
if (!tokenHeader) return null
// If tokens may include non-ASCII characters, normalize to avoid bypass via combining marks
return tokenHeader.normalize('NFC')
}
When integrating with middleBrick, use the CLI to validate that your remediation reduces findings:
middlebrick scan https://your-worker-url.workers.dev
Remediation guidance from the middleBrick report should map findings to controls such as OWASP API Top 10:2023’s ‘2023 A05:2021 – Security Misconfiguration’ by ensuring canonical representations are used consistently. For Cloudflare environments, this means leveraging normalize in Workers and Pages codebases and validating that no edge logic compares raw, unnormalized Unicode strings across trust boundaries.