HIGH unicode normalizationfastapifirestore

Unicode Normalization in Fastapi with Firestore

Unicode Normalization in Fastapi with Firestore — how this specific combination creates or exposes the vulnerability

Unicode normalization inconsistencies between Fastapi request handling and Firestore string storage can lead to authentication bypass, data leakage, and IDOR-like access across visually identical resources. When a Fastapi application accepts user input (e.g., usernames, identifiers, or paths) without normalizing it before using it to construct Firestore document references or queries, equivalent strings that differ in canonical representation may resolve to different documents or keys. This mismatch can allow an authenticated user to access another user’s data by supplying a canonically different but visually equivalent identifier.

For example, a username containing the Latin small letter a (U+0061) combined with a combining acute accent (U+0301) may canonically normalize to a single precomposed á (U+00E1). If Fastapi uses the raw input to build a Firestore document ID or query field without normalization, the lookup may fail to match the document stored under the normalized form. An attacker could exploit this by registering with a normalized form and then authenticating or accessing resources using the decomposed form to bypass expected access controls or enumeration protections.

In Firestore, document IDs and indexed string fields are compared lexicographically based on their stored UTF-8 representation. If your Fastapi service writes data using one normalization form and reads with another, queries may return incomplete or unexpected results, potentially exposing data that should be restricted. This is especially relevant for user-controlled fields used in owner/tenant checks, where a missing normalization step can turn a BOLA/IDOR check into an ineffective filter because the compared values are not canonically equivalent.

Additionally, query injection risks increase when normalization is inconsistent. An attacker could supply carefully crafted combining characters to produce multiple query paths that bypass intended filters or reach sensitive collections. Since Firestore does not automatically normalize strings, the responsibility falls to the application layer. MiddleBrick’s checks for Input Validation and Property Authorization highlight these risks by correlating runtime behavior with the presence of normalization-sensitive endpoints and insecure data exposure patterns in the API surface.

Firestore-Specific Remediation in Fastapi — concrete code fixes

To mitigate Unicode normalization issues in Fastapi with Firestore, normalize all user-supplied strings before using them in Firestore operations. Apply a consistent normalization form, such as NFC, at the earliest point in your request handling. This includes path parameters, query arguments, and request body fields that affect document references or query predicates.

Example Fastapi route with NFC normalization before Firestore access:

from fastapi import FastAPI, Depends, HTTPException, status
from google.cloud import firestore
import unicodedata

app = FastAPI()
db = firestore.Client()

def normalize_string(value: str) -> str:
    return unicodedata.normalize("NFC", value)

@app.get("/users/{username}")
def get_user(username: str):
    safe_username = normalize_string(username)
    doc_ref = db.collection("users").document(safe_username)
    doc = doc_ref.get()
    if not doc.exists:
        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="User not found")
    return {"username": doc.id, "data": doc.to_dict()}

@app.post("/users/{username}/update")
def update_user(username: str, payload: dict):
    safe_username = normalize_string(username)
    doc_ref = db.collection("users").document(safe_username)
    # Ensure the document exists under the normalized key
    if not doc_ref.get().exists:
        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="User not found")
    doc_ref.update(payload)
    return {"status": "ok"}

For queries that filter on user-controlled string fields, normalize the filter values as well to ensure canonical alignment with stored data:

@app.get("/users")
def list_users(display_name: str):
    safe_display_name = normalize_string(display_name)
    docs = db.collection("users").where("display_name", "==", safe_display_name).stream()
    results = [{"id": doc.id, **doc.to_dict()} for doc in docs]
    return {"results": results}

If your application stores document IDs that must remain in a specific form, map incoming identifiers to their canonical equivalents using a lookup table or secondary index, rather than relying on raw input. Combine this practice with Property Authorization checks that verify ownership using normalized identifiers, and validate that query constraints include normalization to avoid bypass via homoglyphs or combining sequences.

When integrating with the middleBrick CLI or Dashboard, review findings from the Input Validation and Property Authorization checks to identify endpoints where normalization gaps exist. Use the provided remediation guidance to adjust your Fastapi routes and Firestore access patterns, ensuring that canonical equivalence is enforced before data is read or written.

Frequently Asked Questions

Why does normalizing user input prevent IDOR-like access in Firestore-backed Fastapi services?
Normalization ensures that visually identical identifiers resolve to the same canonical form before Firestore document references or query filters are constructed. Without normalization, attackers can supply decomposed or alternate representations that bypass equality checks, allowing access to documents that should be restricted.
Should I normalize data before storing it in Firestore, or only at query time in Fastapi?
Apply normalization at the point of entry in Fastapi for both document IDs and indexed string fields, and maintain the same form when writing and reading. This keeps storage and query logic consistent and prevents mismatches that could expose data or weaken Property Authorization checks.