HIGH unicode normalizationfastapi

Unicode Normalization in Fastapi

How Unicode Normalization Manifests in Fastapi

Unicode normalization attacks in Fastapi applications often exploit the framework's handling of international characters and the way Python's string comparison works under the hood. Fastapi, built on Starlette and Pydantic, processes incoming JSON and form data through its dependency injection system, creating multiple opportunities for normalization-based bypasses.

Consider a user authentication endpoint where usernames are compared case-insensitively:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

app = FastAPI()

class LoginRequest(BaseModel):
    username: str
    password: str

@app.post("/login")
async def login(request: LoginRequest):
    stored_user = "admin"
    if request.username.lower() == stored_user.lower():
        # Authentication bypass possible
        return {"message": "Welcome!"}
    raise HTTPException(status_code=403, detail="Invalid credentials")

This code appears secure, but Unicode normalization attacks can bypass it. The Turkish dotless 'i' (ı) and dotted 'i' (i) create a classic vulnerability. In Turkish locale, 'i'.lower() == 'ı', but 'I'.lower() == 'i'. An attacker could use a username like "ADMİN" (with dotted capital I) that normalizes differently than "admin".

Fastapi's Pydantic models compound this issue by automatically validating and converting input data. When Pydantic processes the request body, it doesn't normalize Unicode characters, passing them directly to your business logic:

class User(BaseModel):
    username: str
    email: str

@app.post("/create-user")
async def create_user(user: User):
    # Unicode variations of same character may pass validation
    if user.username == "admin":
        raise HTTPException(status_code=400, detail="Username taken")
    return {"message": "User created"}

An attacker could register "admin" (full-width characters) or "аdmin" (Cyrillic 'a') which visually appear identical but are different Unicode code points. Fastapi's validation passes these through because they're valid UTF-8 strings.

Path parameter matching in Fastapi is another vulnerable area. The framework uses Starlette's routing, which performs exact string matching on URL paths:

@app.get("/users/me")
async def get_current_user():
    return {"message": "This is your profile"}

@app.get("/users/{}")
async def get_user(user_id: str):
    if user_id == "me":
        # Unicode "me" could bypass this check
        return await get_current_user()
    return {"message": f"Profile for {user_id}"}

An attacker could access another user's profile by using Unicode variations of "me" that normalize to the same value but aren't caught by the exact string comparison.

Query parameter handling suffers similar issues. Fastapi's dependency injection system passes raw query parameters to endpoint functions:

@app.get("/search")
async def search(
    q: str = Query(..., description="Search query"),
    case_sensitive: bool = Query(default=False)
):
    if not case_sensitive:
        q = q.lower()  # Vulnerable to Unicode normalization
    # Search logic here
    return {"results": []}

The lowercase conversion without Unicode normalization allows attackers to craft queries that bypass case-insensitive searches.

Fastapi-Specific Detection

Detecting Unicode normalization vulnerabilities in Fastapi applications requires examining both the code patterns and the runtime behavior. The framework's structure creates specific signatures that security scanners can identify.

Static analysis should look for these Fastapi-specific patterns:

# Vulnerable pattern: direct string comparison without normalization
@app.post("/sensitive")
async def sensitive_endpoint(username: str):
    if username == "admin":  # No normalization applied
        return {"access": "granted"}
    return {"access": "denied"}

# Vulnerable pattern: case conversion without Unicode awareness
@app.post("/login")
async def login(username: str, password: str):
    if username.lower() == "admin":  # Locale-dependent behavior
        return {"authenticated": True}
    return {"authenticated": False}

# Vulnerable pattern: Pydantic model comparisons
class UserModel(BaseModel):
    username: str
    
@app.post("/register")
async def register(user: UserModel):
    if user.username == "admin":  # Direct comparison
        raise HTTPException(400, "Username taken")
    return {"status": "ok"}

Dynamic detection during runtime scanning reveals how Fastapi actually processes requests. When middleBrick scans a Fastapi application, it tests Unicode edge cases by sending requests with:

  • Full-width Latin characters (e.g., "admin" vs "admin")
  • Cyrillic homoglyphs (e.g., "аdmin" with Cyrillic 'a')
  • Turkish dotless/dotted 'i' variations
  • Combining character sequences (e.g., "é" vs "é")
  • Zero-width characters and control sequences

The scanner analyzes Fastapi's response patterns to identify normalization bypasses. For authentication endpoints, it attempts login with Unicode variations and checks if any bypass the security checks. For user registration, it tries to register Unicode versions of reserved usernames.

middleBrick's Fastapi-specific scanning includes:

{
  "fastapi_detection": {
    "unicode_bypasses": [
      {
        "endpoint": "/login",
        "attack_vector": "case_insensitive_bypass",
        "payload": "ADMİN" (Turkish dotted I),
        "result": "authentication_bypass_detected"
      }
    ],
    "pydantic_vulnerabilities": [
      {
        "model": "UserModel",
        "field": "username",
        "issue": "unicode_homoglyph_acceptance",
        "severity": "high"
      }
    ]
  }
}

The scanner also examines Fastapi's OpenAPI schema generation, which can reveal parameter validation weaknesses. If the generated schema shows loose validation rules for string parameters, this indicates potential Unicode acceptance issues.

Rate limiting bypass detection is particularly relevant for Fastapi applications using middleware. Unicode variations can create different cache keys:

# Vulnerable rate limiting
@app.middleware("http")
async def rate_limit_middleware(request, call_next):
    key = request.client.host + request.path
    # Unicode variations create different keys
    cache.set(key, current_count)
    # ... rate limiting logic

middleBrick tests whether Unicode variations allow attackers to bypass rate limits by creating distinct cache keys for visually identical requests.

Fastapi-Specific Remediation

Fastapi provides several native approaches to address Unicode normalization vulnerabilities, leveraging Pydantic's validation system and Fastapi's dependency injection framework.

The most effective remediation is implementing Unicode normalization at the Pydantic model level using custom validators:

import unicodedata
from fastapi import FastAPI
from pydantic import BaseModel, validator

class NormalizedUser(BaseModel):
    username: str
    email: str
    
    @validator('username', 'email', pre=True)
    def normalize_unicode(cls, value):
        # NFC: Canonical Decomposition, followed by Canonical Composition
        return unicodedata.normalize('NFC', value)
    
    @validator('username', 'email')
    def check_homoglyphs(cls, value):
        # Check for Cyrillic/Latin confusables
        if any('а' <= char <= 'я' for char in value.lower()):
            raise ValueError('Cyrillic characters not allowed')
        return value

app = FastAPI()

@app.post("/register")
async def register(user: NormalizedUser):
    if user.username == "admin":
        raise HTTPException(400, "Username taken")
    return {"status": "ok"}

This approach ensures all incoming data is normalized before reaching your business logic. The NFC form is generally recommended as it composes characters where possible, reducing the attack surface.

For authentication endpoints, implement case-insensitive comparison with Unicode awareness:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import unicodedata

class LoginRequest(BaseModel):
    username: str
    password: str

app = FastAPI()

def normalize_case_insensitive(value: str) -> str:
    # Casefold is more aggressive than lower() and handles Unicode better
    return unicodedata.normalize('NFC', value.casefold())

@app.post("/login")
async def login(request: LoginRequest):
    stored_user = "admin"
    if normalize_case_insensitive(request.username) == normalize_case_insensitive(stored_user):
        return {"message": "Welcome!"}
    raise HTTPException(status_code=403, detail="Invalid credentials")

Fastapi's dependency injection system allows creating reusable normalization utilities:

from fastapi import Depends, HTTPException
from fastapi.security import OAuth2PasswordBearer

oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")

def get_normalized_username(username: str = Depends()) -> str:
    normalized = unicodedata.normalize('NFC', username.casefold())
    if len(normalized) != len(username):
        raise HTTPException(400, "Unicode normalization changed string length")
    return normalized

@app.post("/secure-endpoint")
async def secure_endpoint(
    username: str = Depends(get_normalized_username),
    password: str
):
    # username is now normalized and safe for comparison
    pass

For API endpoints that must accept international characters but prevent homograph attacks, implement character set validation:

import re
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, validator

class SafeStringModel(BaseModel):
    safe_text: str
    
    @validator('safe_text')
    def validate_safe_characters(cls, value):
        # Allow only basic Latin characters, digits, and common symbols
        if not re.match(r'^[ -~]+$', value):
            raise ValueError('Only ASCII characters allowed')
        return value

app = FastAPI()

@app.post("/safe-input")
async def safe_input(data: SafeStringModel):
    return {"processed": data.safe_text}

Fastapi's middleware system can implement global normalization for all requests:

@app.middleware("http")
async def normalize_request_middleware(request: Request, call_next):
    # Normalize query parameters
    normalized_params = {
        k: unicodedata.normalize('NFC', v) 
        if isinstance(v, str) else v
        for k, v in request.query_params.multi_items()
    }
    
    # Normalize JSON body if present
    body = await request.json()
    if body:
        body = {k: unicodedata.normalize('NFC', v) if isinstance(v, str) else v for k, v in body.items()}
    
    # Create new request with normalized data
    new_request = Request(
        request.scope,
        content=body
    )
    
    response = await call_next(new_request)
    return response

For production deployments, combine these approaches with middleBrick's continuous monitoring to ensure Unicode normalization vulnerabilities don't reappear as your Fastapi application evolves.

Frequently Asked Questions

Why doesn't Fastapi automatically handle Unicode normalization?
Fastapi intentionally avoids automatic Unicode normalization because different applications have different requirements. Some APIs need to accept international characters for legitimate users, while others should restrict input to prevent homograph attacks. Fastapi's philosophy is to provide the tools and flexibility for developers to implement the appropriate security measures for their specific use case, rather than making assumptions about character handling.
Can Unicode normalization vulnerabilities affect Fastapi's OpenAPI documentation?
Yes, Unicode normalization issues can manifest in OpenAPI schemas generated by Fastapi. If your endpoint accepts parameters like "username" without proper validation, the OpenAPI spec will document it as accepting any string. An attacker could use this documentation to discover which endpoints are vulnerable to Unicode-based attacks. Additionally, if your Fastapi application uses path parameters with Unicode characters, the generated OpenAPI paths might not accurately represent all possible variations, potentially hiding attack vectors from security reviewers.