Unicode Normalization in Axum
How Unicode Normalization Manifests in Axum
Unicode normalization vulnerabilities in Axum typically emerge through path traversal and authentication bypass attempts. When Axum extracts path parameters using Path or Query extractors, Unicode characters can be represented in multiple equivalent forms, creating opportunities for bypassing security checks.
The most common manifestation occurs with path traversal attacks. Consider an endpoint that validates file paths:
async fn download_file(
Path((username, filename)): Path<(String, String)>,
) -> impl IntoResponse {
let safe_path = format!("/data/{}/{}", username, filename);
// ... file serving logic
}An attacker can bypass path validation by using Unicode precomposed characters. For example, the filename "/etc/passwd" can be encoded as "/étc/passwd" (where é is the precomposed e-acute). Axum's path extractor will normalize this to the decomposed form "/etc/passwd", allowing directory traversal.
Authentication bypass through Unicode normalization is another critical vector. Many Axum applications use case-insensitive username matching:
async fn authenticate(
Json(credentials): Json<Credentials>,
) -> Result<impl IntoResponse> {
let user = User::find_by_username(&credentials.username)?;
// ... authentication logic
}Using Unicode case folding, an attacker can craft usernames that appear identical to valid ones but bypass exact string matching. For instance, the username "admin" can be represented using Cyrillic "а" characters (U+0430) instead of Latin "a" (U+0061), creating visually identical but distinct strings.
API endpoints that process internationalized domain names (IDN) are particularly vulnerable. Axum applications handling IDN URLs must properly normalize Unicode to prevent homograph attacks where visually similar characters represent different Unicode code points.
Axum-Specific Detection
Detecting Unicode normalization issues in Axum applications requires both static analysis and runtime testing. middleBrick's black-box scanning approach is particularly effective for Axum APIs because it tests the actual HTTP endpoints without requiring source code access.
For path traversal detection, middleBrick tests Axum endpoints with Unicode variants of common traversal patterns:
async fn test_unicode_traversal(client: &Client) {
let test_cases = vec![
("/etc/passwd", "/étc/passwd"),
("/../secret", "/.%2e/%2e%2e/secret"),
("/admin", "/ádmin"),
];
for (original, unicode_variant) in test_cases {
let response = client.get(unicode_variant).send().await;
// Check if response differs from expected behavior
}
}middleBrick's LLM/AI security module also detects Unicode-related prompt injection attempts in Axum applications that serve AI endpoints. The scanner tests for Unicode variations in system prompts and instruction injection payloads.
Property authorization issues in Axum often involve Unicode field names. middleBrick's OpenAPI analysis cross-references endpoint definitions with runtime testing to identify properties that accept Unicode characters but lack proper validation:
#[derive(Deserialize)]
struct UserData {
#[serde(rename = "émail")]
email: String,
// ... other fields
}The scanner verifies whether these Unicode-renamed fields are properly authorized and validated, as improper handling can lead to data exposure or privilege escalation.
Axum-Specific Remediation
Remediating Unicode normalization vulnerabilities in Axum requires a multi-layered approach. The first layer is consistent Unicode normalization using Rust's unicode-normalization crate:
use unicode_normalization::UnicodeNormalization;
async fn safe_path_handler(
Path((username, filename)): Path<(String, String)>,
) -> impl IntoResponse {
// Normalize to NFC form before processing
let normalized_username = username.nfc().collect::>();
let normalized_filename = filename.nfc().collect::>();
let safe_path = format!("/data/{}/{}", normalized_username, normalized_filename);
// Additional validation
if normalized_filename.contains("..") {
return (Status::BadRequest, "Invalid filename").into_response();
}
// ... file serving logic
}For authentication endpoints, implement Unicode-aware case folding using unicode-casefold:
use unicode_casefold::UnicodeCaseFold;
async fn authenticate(
Json(credentials): Json<Credentials>,
) -> Result<impl IntoResponse> {
let normalized_username = credentials.username.to_casefold();
let user = User::find_by_username_casefold(&normalized_username)?;
// Compare case-folded passwords
let normalized_password = credentials.password.to_casefold();
if user.password_hash_matches(&normalized_password) {
// ... authentication success
}
Ok((Status::Unauthorized, "Invalid credentials").into_response())
}Axum's middleware system provides an excellent place to implement global Unicode normalization:
use axum::middleware::Next;
async fn unicode_normalization_middleware(
mut req: Request,
next: Next,
) -> Result<impl IntoResponse> {
// Normalize query parameters
let query = req.uri().query().unwrap_or("");
let normalized_query = query.nfc().collect::>();
// Normalize path segments
let path = req.uri().path();
let normalized_path = path.nfc().collect::>();
// Rebuild request with normalized components
let new_uri = Uri::builder()
.scheme(req.uri().scheme())
.authority(req.uri().authority())
.path_and_query(format!("{}", normalized_path).parse()?)
.build()?;
*req.uri_mut() = new_uri;
next.run(req).await
}For API endpoints handling internationalized data, implement comprehensive validation using libraries like idna for domain names and chardet for character encoding detection:
use idna::domain_to_ascii;
async fn handle_idn_request(
Path(domain): Path<String>,
) -> impl IntoResponse {
match domain_to_ascii(&domain) {
Ok(ascii_domain) => {
// Process valid ASCII domain
}
Err(_) => {
return (Status::BadRequest, "Invalid domain name").into_response();
}
}
}