MEDIUM unicode normalizationchi

Unicode Normalization in Chi

How Unicode Normalization Manifests in Chi

Unicode normalization vulnerabilities in Chi typically emerge through authentication bypass and authorization flaws. When Chi applications process usernames, email addresses, or API keys without proper normalization, attackers can exploit canonical equivalence to bypass security controls.

The most common attack pattern involves username enumeration. Consider a Chi application where user 'café' (U+0065 U+0301) exists. An attacker can register 'café' (U+0063 U+0061 U+0066 U+0065 U+0301), which visually appears identical but is a different Unicode representation. During login, if Chi's authentication middleware doesn't normalize both inputs to the same form, the system may treat these as distinct users, potentially allowing unauthorized access.

// Vulnerable Chi authentication middleware
func AuthMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        username := r.FormValue("username")
        password := r.FormValue("password")
        
        // DANGEROUS: No normalization
        user, err := db.GetUserByUsername(username)
        if err != nil || !CheckPassword(user.Password, password) {
            http.Error(w, "Unauthorized", 401)
            return
        }
        next.ServeHTTP(w, r)
    })
}

Another manifestation occurs in URL path matching. Chi's router may treat '/path/∕' (U+2215 DIVISION SLASH) and '/path/' (U+002F SOLIDUS) as different routes if not properly normalized, enabling path traversal attacks.

// Vulnerable Chi route handling
router.Get("/api/v1/users/:id", func(w http.ResponseWriter, r *http.Request) {
    vars := chi.RouteContext(r.Context())
    id := vars.URLParam("id")
    
    // If id contains precomposed vs decomposed forms, database queries may fail
    // or return incorrect results
    user, err := db.GetUserByID(id)
    if err != nil {
        http.Error(w, "Not found", 404)
        return
    }
    json.NewEncoder(w).Encode(user)
})

Property authorization checks are particularly vulnerable. When Chi applications validate object ownership using non-normalized strings, attackers can manipulate Unicode representations to access resources they shouldn't have permission to view.

// Vulnerable authorization check
func GetDocument(w http.ResponseWriter, r *http.Request) {
    docID := chi.URLParam(r, "doc_id")
    userID := chi.URLParam(r, "user_id")
    
    doc, err := db.GetDocument(docID)
    if err != nil {
        http.Error(w, "Not found", 404)
        return
    }
    
    // DANGEROUS: Direct string comparison without normalization
    if doc.OwnerID != userID {
        http.Error(w, "Forbidden", 403)
        return
    }
    json.NewEncoder(w).Encode(doc)
}

Chi-Specific Detection

Detecting Unicode normalization issues in Chi applications requires both static analysis and runtime testing. middleBrick's black-box scanning approach is particularly effective for identifying these vulnerabilities without requiring source code access.

middleBrick tests for Unicode normalization by submitting inputs in multiple normalization forms and observing how the application responds. For authentication endpoints, it attempts login with precomposed and decomposed character variants to detect canonical equivalence bypass.

# Scan a Chi API endpoint with middleBrick
middlebrick scan https://api.example.com/auth/login \
  --test-authentication \
  --test-bola \
  --test-input-validation

The scanner specifically targets Chi's URL parameter handling and path routing. It submits requests with Unicode characters that have visually similar but distinct code points, monitoring for inconsistent behavior that indicates missing normalization.

For OpenAPI spec analysis, middleBrick resolves $ref references and identifies parameters that accept string inputs without validation. It flags schemas that don't specify normalization requirements or character encoding constraints.

Manual detection techniques include:

  • Testing authentication with precomposed vs decomposed characters
  • Submitting Unicode variants in URL parameters and observing routing behavior
  • Checking database queries for string comparison without normalization
  • Verifying that all user-facing string comparisons use consistent normalization

middleBrick's LLM/AI security module also checks for Unicode-related prompt injection vectors, as certain Unicode characters can break prompt formatting or create visual confusion in AI-generated responses.

Chi-Specific Remediation

Chi applications should implement Unicode normalization at the earliest possible point in request processing. The standard approach uses Go's unicode/norm package to normalize all incoming strings to NFC (Canonical Decomposition, followed by Canonical Composition) form.

import (
    "golang.org/x/text/unicode/norm"
    "github.com/go-chi/chi/v5"
)

Create a normalization middleware that processes all request parameters:

func NormalizationMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Normalize URL parameters
        params := chi.RouteContext(r.Context()).URLParams
        for i, p := range params {
            params[i].Value = norm.NFC.String(p.Value)
        }
        
        // Normalize form values
        r.ParseForm()
        for k, v := range r.Form {
            for i, val := range v {
                r.Form[k][i] = norm.NFC.String(val)
            }
        }
        
        next.ServeHTTP(w, r)
    })
}

// Apply middleware to router
router.Use(NormalizationMiddleware)
router.Use(AuthMiddleware)

For authentication, implement canonical equivalence checking:

func AuthMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        username := norm.NFC.String(r.FormValue("username"))
        password := r.FormValue("password")
        
        user, err := db.GetUserByUsername(username)
        if err != nil || !CheckPassword(user.Password, password) {
            http.Error(w, "Unauthorized", 401)
            return
        }
        
        ctx := context.WithValue(r.Context(), "user", user)
        next.ServeHTTP(w, r.WithContext(ctx))
    })
}

When storing and comparing identifiers, always normalize before database operations:

// Safe user lookup
func GetUserByID(id string) (*User, error) {
    normalizedID := norm.NFC.String(id)
    return db.QueryRow("SELECT * FROM users WHERE normalized_id = ?", normalizedID).Scan()
}

// Safe authorization check
func CheckOwnership(doc *Document, userID string) bool {
    normalizedDocOwner := norm.NFC.String(doc.OwnerID)
    normalizedUserID := norm.NFC.String(userID)
    return normalizedDocOwner == normalizedUserID
}

For API responses, ensure consistent normalization in JSON output:

func JSONResponse(w http.ResponseWriter, data interface{}) {
    jsonBytes, err := json.Marshal(data)
    if err != nil {
        http.Error(w, "Internal Server Error", 500)
        return
    }
    w.Header().Set("Content-Type", "application/json")
    w.Write(jsonBytes)
}

Frequently Asked Questions

Why does Unicode normalization matter for Chi API security?
Unicode normalization is critical for Chi API security because it prevents canonical equivalence attacks where visually identical strings have different Unicode representations. Without normalization, attackers can bypass authentication, access unauthorized resources, or cause inconsistent application behavior by exploiting precomposed vs decomposed character forms.
How does middleBrick detect Unicode normalization vulnerabilities in Chi applications?
middleBrick detects Unicode normalization vulnerabilities by submitting inputs in multiple normalization forms and analyzing application responses. It tests authentication endpoints with precomposed and decomposed character variants, checks URL parameter handling for inconsistent routing, and analyzes OpenAPI specs for missing normalization requirements. The scanner identifies where Chi applications treat canonically equivalent strings as distinct values.