MEDIUM unicode normalizationchidynamodb

Unicode Normalization in Chi with Dynamodb

Unicode Normalization in Chi with Dynamodb — how this specific combination creates or exposes the vulnerability

Chi is an HTTP router for the Erlang ecosystem that encourages precise route matching. When a route pattern includes a dynamic segment such as :id, the value captured from the request path is passed to your handler. If that value is later used as a key in an Amazon DynamoDB query without normalization, an attacker can exploit canonicalization differences to bypass intended access controls or cause inconsistent data retrieval.

DynamoDB stores attribute values as UTF-8 binary data and does not automatically normalize strings before comparison or indexing. Unicode normalization matters because visually identical characters can have multiple binary representations. For example, the Latin small letter é can be encoded as a single code point U+00E9 or as a decomposed sequence of U+0065 followed by U+0301. Without normalization, a query for one representation may not match items stored with the other, leading to missing results or, in an authorization context, allowing access to unintended resources due to mismatched key lookups.

In Chi, a route like GET /users/:id might capture an ID that includes a normalized user handle. If the client sends a precomposed café while the DynamoDB item stores the decomposed café, the query fails to locate the item. This mismatch can be leveraged in a BOLA/IDOR scenario: an attacker supplies a specially crafted Unicode variant that maps to a different internal key, potentially retrieving or modifying another user’s data if the application logic does not enforce strict normalization before authorization checks.

The LLM/AI Security checks in middleBrick specifically test for output leakage that could include sensitive data or credentials. If a DynamoDB query returns unexpected items due to normalization issues, the resulting output might expose PII or secrets that would then be flagged by output scanning. Additionally, inconsistent normalization can affect inventory management and property authorization checks, where attribute values are compared against expected patterns or enumerated lists stored in DynamoDB.

To detect these risks, middleBrick scans the unauthenticated attack surface, including OpenAPI specifications where path parameters like {id} are defined. The tool cross-references spec definitions with runtime behavior, highlighting cases where string formats and normalization expectations are not explicitly constrained. This is valuable because the scanner identifies mismatches between documented input rules and actual endpoint behavior without requiring credentials.

Dynamodb-Specific Remediation in Chi — concrete code fixes

Remediation focuses on normalizing all user-controlled strings before they are used in DynamoDB operations. Apply normalization consistently for both read and write paths, using the same Unicode form (recommended: NFC) across your service. In Chi, this can be done in route handlers or in shared data access modules.

Example using Erlang’s unicode module to normalize incoming route parameters before building DynamoDB requests:

%% In your Chi handler module
handle_get_user(Request, id) -
    NormalizedId = unicode:characters_to_binary(id, utf8, nfc),
    Item = dynamodb_get_item("Users", #{"PK" => {s, NormalizedId}}),
    %% proceed with response
    ...

dynamodb_get_item(Table, Key) -
    %% Wrapper around the AWS SDK call
    aws_dynamodb:get_item(#{table => Table, key => Key}).

If you accept JSON payloads that contain string fields used as DynamoDB keys or filter values, normalize those fields as well:

%% Example with a JSON body containing a user handle
handle_update(Request) -
    #{body := #{"handle" := Handle0}} = cowboy_req:read_body(Request),
    Handle = unicode:characters_to_binary(Handle0, utf8, nfc),
    %% Ensure the update targets the correct normalized attribute
    aws_dynamodb:update_item(#{table => "Users",
                               key => #{"PK" => {s, Handle}},
                               updates => [...]}).

When generating values that will be stored in DynamoDB, normalize them at the point of creation to avoid storing multiple representations of the same logical value:

create_user(Email) when is_binary(Email) -
    NormalizedEmail = unicode:characters_to_binary(Email, utf8, nfc),
    aws_dynamodb:put_item(#{table => "Users",
                            item => #{"PK" => {s, NormalizedEmail},
                                      "GSI1PK" => {s, <<"email#"/binary, NormalizedEmail>>}}}).

For applications using the middleBrick Pro plan, continuous monitoring can be configured to alert when responses contain anomalies that may indicate normalization-related inconsistencies. The CLI can be integrated into scripts to validate that endpoints consistently handle Unicode inputs as expected, and the GitHub Action can enforce that submitted API definitions document normalization expectations for string parameters.

Frequently Asked Questions

Why does Unicode normalization matter for DynamoDB keys in Chi routes?
DynamoDB compares attribute values exactly as stored; it does not normalize strings. If your Chi route captures an ID or handle and uses it as a DynamoDB key, different Unicode representations of the same text will not match, which can lead to missing items or bypassed authorization checks.
How can I verify my Chi endpoints handle normalization correctly during scanning?
Using the middleBrick CLI, run middlebrick scan <your-url> and review the findings. The scan includes input validation checks that highlight inconsistencies in how string formats are handled and whether normalization is documented in the OpenAPI spec.