HIGH unicode normalizationfirestore

Unicode Normalization in Firestore

How Unicode Normalization Manifests in Firestore

Firestore treats document IDs and field names as exact byte strings. When a user‑supplied value contains Unicode characters, different canonical forms (NFC, NFD, NFKC, NFKD) can represent the same visual string but are stored as distinct keys. This creates a class of bypasses where an attacker can supply an alternative normalization form to evade checks that compare the input against an expected value.

Consider a simple ownership check: the application reads a user ID from a JWT, then queries Firestore for a document whose ID equals that user ID. If the comparison is performed on the raw string without normalization, an attacker who knows the user ID can register a document with the same visual ID but in NFD form (e.g., the letter ‘e’ followed by a combining acute accent instead of the pre‑composed ‘é’). The lookup fails, the application may fall back to a default document, or the attacker can later request the NFD‑formatted document directly, gaining access to data they should not see.

The following Node.js snippet shows the vulnerable pattern:

const { getFirestore, doc, getDoc } = require('firebase-admin/firestore');

async function getProfile(req) {
  const userId = req.auth.uid; // comes from JWT, assumed trusted
  // ❌ No normalization – userId is used directly
  const userRef = doc(getFirestore(), 'profiles', userId);
  const snap = await getDoc(userRef);
  if (!snap.exists()) {
    throw new Error('Profile not found');
  }
  return snap.data();
}

If the JWT contains the normalized form “é” (U+00E9) but an attacker has created a profile document with the ID “é” (U+0065 U+0301), the lookup will not find the attacker’s document, yet the attacker can later request the NFD version directly and bypass the intended ownership check.

Firestore-Specific Detection

middleBrick’s Input Validation check includes a Unicode normalization probe set. During a scan it sends the same parameter value in each of the four normalization forms (NFC, NFD, NFKC, NFKD) and observes whether the API’s responses differ in status code, returned data, or error messages. A discrepancy indicates that the backend treats the forms as distinct keys, which is exactly the condition that enables the Firestore bypass described above.

For example, when scanning an endpoint that accepts a documentId path parameter, middleBrick will issue requests such as:

  • GET /api/profiles/%C3%A9 (NFC “é”)
  • GET /api/profiles/65%CC%81 (NFD “é”)
  • GET /api/profiles/%C3%A9 (NFKC, NFKC identical to NFC for this case)
  • GET /api/profiles/65%CC%81 (NFKD, same as NFD)

If the API returns 200 for the NFC version but 404 or a different payload for the NFD version, middleBrick flags the finding with severity “Medium” and provides the exact payloads that triggered the mismatch.

You can reproduce the check locally with the middleBrick CLI:

npx middlebrick scan https://example.com/api/profiles/é

The output will include a section like:

[Input Validation] Unicode normalization mismatch detected
  - NFC (é) → 200 OK, returned profile data
  - NFD (é) → 404 Not Found
  Remediation: normalize incoming identifiers to a single Unicode form before using them as Firestore keys.

This detection works without any agents or configuration; you only need to provide the public URL of the API endpoint.

Firestore-Specific Remediation

The fix is to canonicalize all user‑supplied strings that will be used as document IDs, field names, or values in Firestore queries to a single Unicode normalization form—most commonly NFC—before they reach the Firestore SDK. This ensures that visually identical strings map to the same Firestore key, eliminating the bypass.

Here is the corrected version of the earlier Node.js example:

const { getFirestore, doc, getDoc } = require('firebase-admin/firestore');

function normalizeId(raw) {
  // NFC is the recommended form for most applications
  return raw.normalize('NFC');
}

async function getProfile(req) {
  const rawUserId = req.auth.uid;
  const userId = normalizeId(rawUserId); // ✅ Normalize before use
  const userRef = doc(getFirestore(), 'profiles', userId);
  const snap = await getDoc(userRef);
  if (!snap.exists()) {
    throw new Error('Profile not found');
  }
  return snap.data();
}

The same normalization should be applied wherever user input influences Firestore keys, including:

  • Path parameters for document IDs
  • Query parameters used in where filters on string fields
  • Values written to map fields that are later used as keys in sub‑collections

Because Firestore security rules cannot perform Unicode normalization, the enforcement must happen in application code (or in a Cloud Function that validates writes). After applying the fix, re‑run middleBrick to confirm that the Input Validation check no longer reports a mismatch:

npx middlebrick scan https://example.com/api/profiles/é

The resulting report should show the Input Validation check passing, confirming that all Unicode forms now lead to the same Firestore key and that the ownership check is resilient to normalization‑based bypasses.

Frequently Asked Questions

Does Firestore automatically normalize Unicode values stored in documents?
No. Firestore stores strings exactly as they are provided; different Unicode normalization forms are treated as distinct values. Normalization must be performed by the client or backend before writing or querying.
How does middleBrick detect Unicode normalization issues in a Firestore‑backed API?
middleBrick’s scan sends the same parameter in NFC, NFD, NFKC, and NFKD forms and looks for differences in HTTP status codes, response bodies, or error messages. A difference indicates that the backend treats the forms as separate keys, which is reported as an Input Validation finding.