Unicode Normalization in Chi with Mongodb
Unicode Normalization in Chi with Mongodb — how this specific combination creates or exposes the vulnerability
Unicode Normalization in Chi using Mongodb can expose an API to injection and comparison bypass when user-controlled strings are not consistently normalized before being stored or queried. Chi routes requests through pattern-matching and parameter extraction; if a route parameter or body field is used directly in a Mongodb query without normalization, visually identical characters with different code point sequences (e.g., composed vs. decomposed forms) may not match stored data or may bypass allowlist checks.
For example, consider an endpoint that looks up a user by username. A user could register with a normalized NFC form of a character, while an attacker supplies the same visual character in a decomposed NFD form. Without normalization, the query may return an unexpected record or fail to enforce uniqueness constraints correctly, leading to authentication bypass or account confusion. In security scans, this appears as an input validation and property authorization finding because the API does not guarantee canonical representation before evaluation.
When the API also exposes an unauthenticated endpoint that echoes user data, different normalization forms can lead to reflected content that contains combining marks or invisible variation selectors, which may be interpreted differently by clients or logging systems. This can amplify information disclosure risks and complicate output validation. middleBrick’s input validation checks flag inconsistent normalization across requests, highlighting where canonicalization should be applied before any Mongodb operation.
In the context of LLM security, if any user-controlled text is fed into prompts or stored for generation, inconsistent normalization can create prompt injection vectors where visually identical strings trigger different model behavior. middleBrick’s LLM/AI Security checks detect abnormal patterns in outputs and probe for injection paths that could be chained with malformed Unicode input.
middleBrick scans identify these issues by comparing runtime behavior against the OpenAPI specification, including $ref-resolved schemas. If a request body schema defines a string format without explicit normalization requirements, the scanner highlights the gap and provides remediation guidance to enforce NFC or NFD consistently in Chi handlers and Mongodb queries.
Mongodb-Specific Remediation in Chi — concrete code fixes
To remediate Unicode normalization issues in Chi with Mongodb, normalize all incoming strings before validation, storage, and querying. Use a library to apply a consistent form (typically NFC) in Chi middleware or route handlers, and ensure Mongodb queries use the same normalized value. Below are concrete code examples demonstrating this approach.
Chi middleware for normalization
Define a small middleware that normalizes relevant parameters early in the pipeline:
(ns myapp.middleware
(:require [cheshire.core :as json]
[clojure.string :as str]))
(defn normalize-param [param-name]
(fn [handler]
(fn [request respond raise]
(if-param v (get request :params param-name)
(let [normalized (some-namespace/normalize v)] ; apply NFC/NFD explicitly
(handler (assoc request :params (assoc (:params request) param-name normalized))))
(handler request)
))))
Normalization before Mongodb query
In your route handler, normalize the identifier used in the query and use it in a Mongodb filter:
(ns myapp.handlers
(:require [monger.core :as mg]
[monger.collection :as mc]
[cheshire.core :as json]
[myapp.middleware :refer [normalize-param]]))
(defn get-user-by-username [request respond raise]
(let [username (get-in request [:params :username])
normalized-username (some-namespace/normalize username) ; normalize to NFC
db (mg/get-db)
user (mc/find-one-as-map db "users" {:username normalized-username})]
(respond {:status 200 :body (or user {})}))))
Insert with normalization and uniqueness enforcement
When creating a user, normalize before insert and rely on a unique index on the normalized field:
(ns myapp.handlers
(:require [monger.core :as mg]
[monger.collection :as mc]))
(defn create-user [request respond raise]
(let [body (json/parse-string (slurp (:body request)) true)
username (get body "username")
normalized-username (some-namespace/normalize username)
db (mg/get-db)]
(try
(mc/insert-and-return db "users" {:username normalized-username :email (get body "email")})
(respond {:status 201 :body {:ok true}})
(catch Exception e
(respond {:status 409 :body {:error (str e)}})))))
Ensuring consistent normalization in queries
For any query that involves user-supplied strings, always normalize before building the filter. This includes $regex patterns or case-insensitive collation if used:
(defn search-users [request respond raise]
(let [term (some-namespace/normalize (get-in request [:params :q]))
db (mg/get-db)
results (mc/find-maps db "users" {:username {$regex term :flags "i"}})] ; normalized term
(respond {:status 200 :body results})))
Indexing considerations
Create a unique index on the normalized field to prevent duplicates that differ only by normalization form:
(ns myapp.db.indexes
(:require [monger.core :as mg]
[monger.collection :as mc]))
(defn setup-indexes []
(let [db (mg/get-db)]
(mc/create-index db "users" {:username 1} {:unique true :name "username_normalized_unique"})))
By normalizing at the edge (Chi handlers) and enforcing it in Mongodb operations, you eliminate comparison mismatches and reduce the attack surface for injection and bypass techniques tied to Unicode representation.