HIGH unicode normalizationdjangodynamodb

Unicode Normalization in Django with Dynamodb

Unicode Normalization in Django with DynamoDB — how this specific combination creates or exposes the vulnerability

Unicode Normalization becomes a security concern in Django when an application stores and compares user-controlled identifiers (such as usernames, API keys, or object keys) without normalizing input before it reaches DynamoDB. DynamoDB itself does not normalize strings; it stores and matches byte-for-byte based on the UTF-8 binary value you provide. If Django passes different Unicode representations of the same logical string to DynamoDB, the database may treat them as distinct items even when they should be equivalent. This mismatch can bypass access control checks, allow duplicate records, or enable enumeration attacks.

For example, consider a Django model that uses a Unicode primary key stored in DynamoDB. A user could supply a composed Unicode character (e.g., é as U+00E9) in one request and a decomposed form (e.g., e followed by combining acute accent U+0301) in another. Without normalization, these two keys map to different DynamoDB items, potentially allowing horizontal privilege escalation via BOLA/IDOR if access checks rely on key equality alone. Attackers can probe these variations to locate or manipulate records they should not access.

The risk is amplified when DynamoDB is used as a backend for session or cache data in Django. Inconsistent normalization between write and read paths can cause session fixation or replay issues. An authenticated session key stored in one normalization form might not match the user-supplied value on subsequent requests, leading to unexpected behavior or information leakage when error messages reveal record existence.

Because middleBrick scans the unauthenticated attack surface and includes checks such as Input Validation and Property Authorization, it can surface inconsistencies between how Django prepares data and how DynamoDB persists it. Findings often highlight missing canonicalization and provide remediation guidance to enforce normalization at the application layer before any DynamoDB operation.

DynamoDB-Specific Remediation in Django — concrete code fixes

To mitigate Unicode Normalization issues when using DynamoDB with Django, normalize all user-supplied strings before they are used in DynamoDB key construction, queries, or conditional expressions. Use Python’s built-in unicodedata module to apply NFC or NFD consistently across your application. The preferred approach is to normalize at the boundary where data enters the DynamoDB workflow, ensuring both read and write paths use the same canonical form.

Below are concrete code examples for a Django project using boto3-based DynamoDB access (e.g., with django-dynamodb-backend or a custom wrapper). The examples enforce NFC normalization for identifiers and demonstrate safe comparison and storage patterns.

Example 1: Normalizing a model identifier before saving to DynamoDB

import unicodedata
import boto3
from django.conf import settings

def normalize_unicode(value: str) -> str:
    return unicodedata.normalize('NFC', value)

def put_item(table_name: str, item: dict):
    client = boto3.client('dynamodb',
        endpoint_url=settings.AWS_DYNAMODB_ENDPOINT,
        region_name=settings.AWS_REGION,
        aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
        aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
    )
    # Ensure the partition key is normalized
    if 'user_id' in item:
        item['user_id'] = normalize_unicode(item['user_id'])
    # If using sort keys, normalize them as well
    if 'sort_key' in item:
        item['sort_key'] = normalize_unicode(item['sort_key'])
    client.put_item(TableName=table_name, Item=item)

# Usage in a Django service
user_id = request.POST.get('user_id')  # could contain composed or decomposed forms
item = {
    'user_id': {'S': user_id},
    'email': {'S': request.POST.get('email')},
}
put_item('MyTable', item)

Example 2: Normalizing before query and retrieving from DynamoDB

def get_item_by_user_id(table_name: str, raw_user_id: str):
    normalized_user_id = normalize_unicode(raw_user_id)
    client = boto3.client('dynamodb',
        endpoint_url=settings.AWS_DYNAMODB_ENDPOINT,
        region_name=settings.AWS_REGION,
        aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
        aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
    )
    response = client.get_item(
        TableName=table_name,
        Key={
            'user_id': {'S': normalized_user_id},
        }
    )
    return response.get('Item')

# In a view or manager
raw_id = request.GET.get('id')
data = get_item_by_user_id('MyTable', raw_id)
if data is None:
    # Handle not found — normalization prevents false negatives due to encoding differences
    pass

Example 3: Enforcing normalization in a Django custom manager

from django.core.exceptions import ValidationError
import unicodedata

class DynamoUserManager:
    def __init__(self, table_name='Users'):
        self.table_name = table_name
        self.client = boto3.client('dynamodb', region_name='us-east-1')

    def normalize(self, value: str) -> str:
        return unicodedata.normalize('NFC', value)

    def create(self, username: str, email: str):
        username_n = self.normalize(username)
        self.client.put_item(TableName=self.table_name, Item={
            'username': {'S': username_n},
            'email': {'S': email},
        })
        return username_n

    def get(self, username: str):
        username_n = self.normalize(username)
        resp = self.client.get_item(TableName=self.table_name, Key={
            'username': {'S': username_n},
        })
        return resp.get('Item')

# Usage
manager = DynamoUserManager()
manager.create('café', 'user@example.com')  # stored in NFC
item = manager.get('café')  # query using NFC — consistent

These patterns ensure that the same logical string always maps to the same DynamoDB key, reducing the risk of BOLA/IDOR, duplicate entries, and enumeration. middleBrick can detect missing normalization by comparing findings across authentication and property authorization checks, emphasizing the need for consistent canonicalization before data reaches DynamoDB.

Frequently Asked Questions

Does DynamoDB store strings in a normalized form?

No. DynamoDB stores UTF-8 binary values as provided and performs exact byte matching; it does not apply Unicode normalization. Applications must normalize before storage and comparison.

Which normalization form is recommended for DynamoDB keys in Django?

Use NFC (Normalization Form Canonical Composition) for consistency and compatibility. Apply unicodedata.normalize('NFC', value) before using strings in DynamoDB keys or queries.

Unicode Normalization in Django with Dynamodb