HIGH unicode normalizationchijwt tokens

Unicode Normalization in Chi with Jwt Tokens

Unicode Normalization in Chi with Jwt Tokens — how this specific combination creates or exposes the vulnerability

Chi is an HTTP client for Elixir that is commonly used to call external APIs, including OAuth2 introspection and JWT validation endpoints. When Chi is used to fetch or verify JSON Web Tokens, subtle encoding differences in Unicode identifiers can lead to inconsistent parsing and unexpected behavior. Unicode normalization becomes relevant when a JWT contains claims or headers with non-ASCII characters—for example, email addresses with accents, usernames using Latin-1 supplements, or directory names in Asian scripts. If the token payload includes characters like é (U+00E9) and the application compares it after normalization to a precomputed value without applying a canonical form, the comparison may succeed or fail depending on how Chi or the underlying library handles string representation. This inconsistency can expose an authentication bypass or token confusion risk if the normalized and non-normalized forms are treated as equivalent by business logic but not by cryptographic verification.

JWT parsing libraries in Elixir, such as joken or manual JOSE decoders, typically work on binaries. If Chi retrieves a JWK Set or introspects a token over HTTPS and passes the JSON body directly to a parser, any Unicode normalization applied only on the client side can create a mismatch between the expected and actual claims. For instance, a token issued with a normalized form (NFC) might be compared against a database key stored in NFD, leading to false negatives in claim validation. Attackers can exploit this by registering or manipulating identities with homoglyphs—characters that look similar but have different code points—to bypass allowlists that rely on exact string matching. Because Chi is often used in backend microservices that trust the JWT after verification, inconsistent handling of normalization across the stack can weaken the effective security boundary, especially when input validation and identity checks are not consistently normalized before comparison.

Additionally, header fields like kid (Key ID) may include non-ASCII metadata in some custom implementations, and if Chi forwards or logs these values without normalization, it can lead to inconsistent audit trails or injection through encoding mismatches. Although JWT specifications recommend treating the token as an opaque string for signature verification, the claims set is often processed programmatically. If normalization is applied selectively—such as only on email or username claims but not on roles or scopes—an attacker can craft a token where a critical claim bypasses authorization checks due to a normalization discrepancy. MiddleBrick’s checks for Input Validation and Property Authorization are designed to surface such inconsistencies by correlating runtime behavior with schema expectations, helping teams identify whether normalization is handled consistently across the API surface that interacts with JWTs.

Jwt Tokens-Specific Remediation in Chi

To remediate Unicode normalization issues when using Chi with JWTs, enforce canonical normalization before any comparison, storage, or logging. Use a well-maintained Unicode library in Elixir, such as unicode_util or functions from the :unicode module, to normalize strings to a single form—typically NFC or NFKC—before validation. Apply normalization consistently across all claims that are subject to access control, including email, username, roles, scopes, and any custom identifiers. Avoid relying on exact binary equality for user-controlled values that may contain international characters.

Code example: Normalizing claims in Chi-based JWT validation

# mix.exs: add {:unicode_util, "~> 0.1.0"} or use :unicode functions directly
# Normalize incoming JWT claims before comparison
defmodule MyApp.JwtValidator do
  import Unicode.Util, only: [nfc: 1]

  def validate_claims(token, expected_claims) do
    with {:ok, claims} <- extract_claims(token),
         normalized_claims <- normalize_claims(claims),
         true <- compare_claims(normalized_claims, expected_claims) do
      {:ok, normalized_claims}
    else
      _ -> {:error, :invalid_claims}
    end
  end

  defp extract_claims(token) do
    # Use Joken or JOSE for decoding; this example assumes manual parsing
    case Jason.decode(token) do
      {:ok, %{"claims" => claims}} -> {:ok, claims};
      _ -> {:error, :invalid_token}
    end
  end

  defp normalize_claims(claims) do
    Enum.into(claims, %{}, fn {k, v} ->
      {k, normalize_value(v)}
    end)
  end

  defp normalize_value(value) when is_binary(value) do
    nfc(value)
  end

  defp normalize_value(value) when is_list(value) do
    Enum.map(value, &normalize_value/1)
  end

  defp normalize_value(value), do: value

  defp compare_claims(actual, expected) do
    Map.equal?(actual, expected)
  end
end

When using Chi to fetch JWK Sets or introspect tokens, ensure that any string values extracted from the JSON response are normalized before being used in authorization decisions. For example, if you retrieve a JWK with a kid that contains non-ASCII metadata, normalize it before lookup:

# Using Chi to fetch JWK Set and normalize kid values
chi_url = "https://auth.example.com/.well-known/jwks.json"
{:ok, response} = Tesla.get(chi_url, [], hackney_opts: []) 

jwks = 
  case Jason.decode(response.body) do
    {:ok, %{"keys" => keys}} ->
      Enum.map(keys, fn key ->
        normalized_kid = Unicode.Util.nfc(key["kid"] || "")
        Map.put(key, "kid", normalized_kid)
      end)
    _ -> []
  end

In your validation pipeline, integrate normalization as an early step so that tokens with homoglyphs are treated consistently. Combine this with schema-based checks for the claims set and enforce allowlists using normalized values. Complement these code-level fixes with runtime verification using tools like MiddleBrick’s scans, which can detect inconsistencies between your OpenAPI contract and actual behavior, including how non-ASCII inputs are handled in authentication flows.

Frequently Asked Questions

Can Unicode normalization issues in Chi JWT flows lead to authentication bypass?
Yes. If normalization is applied inconsistently across claims, an attacker can supply a token with a homoglyph that passes one check but fails another, potentially bypassing authorization or identity validation.
Does MiddleBrick detect Unicode normalization risks in JWT handling?
MiddleBrick’s Input Validation and Property Authorization checks can surface inconsistencies by correlating schema definitions with runtime behavior, helping identify where normalization may be uneven across claims.