HIGH unicode normalizationflaskbearer tokens

Unicode Normalization in Flask with Bearer Tokens

Unicode Normalization in Flask with Bearer Tokens — how this specific combination creates or exposes the vulnerability

Unicode normalization attacks exploit the fact that visually identical characters can have different byte representations in Unicode. In Flask APIs that use Bearer tokens, normalization can bypass token validation logic when comparison is performed on a non-normalized input versus a normalized stored value. For example, an attacker might include a token that contains a Latin small letter a with acute (U+00E1) in a URL or header, while the server’s stored token is precomposed or decomposed differently. If Flask does not normalize before comparison, the tokens may not match, enabling token confusion or privilege escalation.

Flask itself does not normalize strings automatically, and developers often compare raw request data directly with stored values. This becomes especially dangerous when Bearer tokens are passed via Authorization headers, query parameters, or custom headers. An attacker could supply a token with combining diacritics or variant forms (e.g., using U+0063 U+0327 instead of U+00E7), and if the application normalizes only one side or relies on string equality, authentication may be incorrectly accepted or rejected in unsafe ways.

In practice, such weaknesses can lead to Authentication Bypass or Token Confusion. Consider an API that checks request.headers.get('Authorization') against a database value without normalization. An attacker’s crafted token might pass the check due to canonicalization differences even though it is not the intended token. These issues are related to broader classes like BOLA/IDOR when token handling intersects with object-level authorization, and they can be surfaced by scanners that test normalization variants alongside unauthenticated endpoints.

middleBrick’s checks include input validation and authentication tests that exercise these edge cases, highlighting where normalization gaps exist in the unauthenticated attack surface. The scanner does not modify your code; it identifies risky patterns so you can apply consistent normalization and comparison strategies.

Bearer Tokens-Specific Remediation in Flask — concrete code fixes

To secure Bearer token handling in Flask, normalize all token inputs before storage and comparison. Use a canonical Unicode form such as NFC or NFD consistently. For example, apply normalization to incoming Authorization header values and to stored tokens before equality checks. Below are concrete code examples demonstrating safe handling.

from flask import Flask, request, jsonify
import unicodedata

app = Flask(__name__)

def normalize_token(token: str) -> str:
    # Choose one canonical form and use it everywhere
    return unicodedata.normalize('NFC', token)

@app.route('/protected')
def protected():
    auth = request.headers.get('Authorization', '')
    if not auth.startswith('Bearer '):
        return jsonify({'error': 'missing_bearer'}), 401
    token = auth.split(' ', 1)[1]
    normalized_token = normalize_token(token)
    # Compare against normalized stored token
    stored_token = normalize_token(fetch_stored_token_for_current_user())
    if not constant_time_compare(normalized_token, stored_token):
        return jsonify({'error': 'invalid_token'}), 401
    return jsonify({'status': 'ok'}), 200

def constant_time_compare(a: str, b: str) -> bool:
    # Use hmac.compare_digest to avoid timing attacks
    import hmac
    return hmac.compare_digest(a, b)

def fetch_stored_token_for_current_user() -> str:
    # Replace with your user/token lookup logic
    return 'example-token'

If you store tokens in a database, normalize at write time as well, so stored values are already in the chosen canonical form. This eliminates mismatch risks caused by variant inputs. Additionally, always use hmac.compare_digest for comparison to prevent timing attacks, and reject tokens containing disallowed characters or overly long inputs to reduce injection surface.

For broader coverage, integrate these practices with your existing security posture. The middleBrick CLI can be used locally to scan endpoints and surface input validation and authentication findings; the GitHub Action can enforce thresholds in CI/CD; and the MCP Server allows you to run scans from within AI coding assistants. These integrations help catch normalization and token-handling issues before deployment.

Frequently Asked Questions

Why does Unicode normalization matter for Bearer tokens in Flask?
Because visually identical characters can have multiple binary representations. If Flask compares raw user input to stored tokens without normalizing to a canonical form (e.g., NFC), attackers can craft tokens that bypass authentication checks due to mismatched byte sequences.
Should I normalize tokens on input, on comparison, or both?
Normalize both at input (when storing or indexing) and at comparison time. Consistent canonicalization across storage and validation ensures that equality checks are reliable and prevents token confusion attacks.