Unicode Normalization in Axum with Firestore
Unicode Normalization in Axum with Firestore — how this specific combination creates or exposes the vulnerability
Unicode normalization inconsistencies between Axum request handling and Firestore document storage can lead to authentication bypass, data integrity issues, and information exposure. When Axum applications accept user input (e.g., usernames, identifiers, or API keys) and persist them to Firestore without normalizing to a canonical form, equivalent strings may be stored differently. For example, the character "é" can be represented as a single code point U+00E9 or as a decomposed sequence "e" + U+0301. If Axum passes such input directly to Firestore without normalization, two logically identical identifiers may be stored as separate documents or keys, enabling IDOR-like confusion where one user can access another’s data by supplying an alternate but equivalent representation.
In Firestore, document paths and map keys are compared as UTF-8 byte sequences without automatic normalization. If Axum routes or query parameters are used to construct Firestore document IDs or collection names without normalization, an attacker can craft semantically equivalent but structurally distinct paths. For example, a document ID composed of NFC-normalized text may be reachable via a decomposed NFD input, bypassing intended access controls implemented in Axum middleware. This becomes critical when identifiers are used as security-sensitive references such as tenant IDs, organization slugs, or resource handles.
The interaction also affects indexing and querying. Firestore queries are sensitive to the exact string values stored. If Axum normalizes input on read but Firestore contains non-normalized entries (or vice versa), queries may silently return incomplete or incorrect result sets. An attacker could exploit this by submitting carefully crafted inputs that match different normalization forms, leading to data leakage or inconsistent application state. The risk is compounded when Axum deserializes JSON payloads into Rust structs and maps fields directly to Firestore document fields without canonical normalization, creating a subtle cross-layer inconsistency that standard validation does not catch.
Additionally, metadata such as created_by or updated_by fields derived from user-supplied strings may be stored in Firestore without normalization, complicating audit trails and enabling privilege escalation if an attacker can manipulate their identity representation across normalization boundaries. Because Firestore does not warn or error on normalization variance, the application must enforce consistency at the Axum layer before any persistence occurs.
Firestore-Specific Remediation in Axum — concrete code fixes
To mitigate normalization issues, normalize all user-controlled strings in Axum before any Firestore interaction. Use a well-maintained Unicode crate to apply NFC (or a policy suited to your use case) consistently across request handling, ensuring that document IDs, map keys, and query fields are canonical before being sent to Firestore.
use axum::extract::State;
use firestore::FirestoreDb;
use unicode_normalization::UnicodeNormalization;
struct AppState {
db: FirestoreDb,
}
async fn get_user_document(
State(state): State<AppState>,
user_id: String,
) -> Result<Option<firestore::Document>, (axum::http::StatusCode, String)> {
// Normalize to NFC before using as Firestore document ID
let normalized_id: String = user_id.nfc().collect();
let doc = state
.db
.doc(&format!("users/{}", normalized_id))
.get()
.await
.map_err(|e| (axum::http::StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
Ok(doc)
}
When constructing Firestore document paths or map keys from multiple input segments, normalize each segment individually and then combine them to avoid mixed normalization forms:
async fn store_user_preferences(
State(state): State<AppState>,
user_id: String,
preference_key: String,
preference_value: String,
) -> Result<(), (axum::http::StatusCode, String)> {
let uid: String = user_id.nfc().collect();
let key: String = preference_key.nfc().collect();
let value: String = preference_value.nfc().collect();
let doc_path = format!("users/{}/preferences/{}", uid, key);
state
.db
.set(&doc_path, &serde_json::json!({ "value": value }))
.await
.map_err(|e| (axum::http::StatusCode::BAD_REQUEST, e.to_string()))?;
Ok(())
}
For query operations, normalize user inputs before building Firestore queries so that the query string matches the stored representation:
async fn query_user_by_email(
State(state): State<AppState>,
email: String,
) -> Result<Option<firestore::Document>, (axum::http::StatusCode, String)> {
let normalized_email: String = email.nfc().collect();
let results = state
.db
.query("users")
.eq("email", &normalized_email)
.fetch()
.await
.map_err(|e| (axum::http::StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
Ok(results.into_iter().next())
}
Apply the same normalization to fields used in Firestore security rules references (e.g., tenant slugs or organization identifiers) and ensure Axum middleware normalizes before rule evaluation. This consistency prevents bypasses where semantically identical strings with different binary representations are treated as distinct by Firestore but compared in Axum after partial normalization.