HIGH unicode normalizationexpresscockroachdb

Unicode Normalization in Express with Cockroachdb

Unicode Normalization in Express with Cockroachdb — how this specific combination creates or exposes the vulnerability

Unicode normalization inconsistencies between Express request handling and Cockroachdb string comparison can create authentication bypass and data integrity risks. When an Express application receives user input, it may normalize strings differently than Cockroachdb, allowing visually identical strings with different code point sequences to be treated as distinct values. For example, the character "é" can be represented as a single code point U+00E9 or as a combination of "e" (U+0065) followed by a combining acute accent (U+0301). If Express normalizes incoming data to NFC form while Cockroachdb stores or compares using NFD (or vice versa), queries may match unintended records or fail to match intended ones.

This mismatch becomes critical in authentication flows. An attacker could register with a normalized username while the application compares against a differently normalized stored value, potentially gaining access to another user's account. In API endpoints that use user-supplied identifiers to query Cockroachdb, inconsistent normalization enables IDOR-like conditions where changing character composition returns different datasets without authorization changes. The issue also affects search and filtering endpoints, where paginated results become unpredictable as equivalent strings sort differently based on representation differences.

Input validation layers in Express may reject or alter certain normalization forms, but Cockroachdb preserves the original byte sequence, creating a disconnect between validation and persistence. Security checks that rely on exact string matching—such as role names, permission flags, or resource identifiers—can be bypassed when normalization variants bypass validation but resolve to different internal representations. Log analysis and audit trails become unreliable because the same logical entity appears with multiple representations across requests and database entries.

Middleware that performs normalization before database interaction is essential. Without consistent normalization at the Express layer before values reach Cockroachdb, the database cannot be relied upon to enforce uniqueness constraints or accurate comparisons. Developers must treat string handling as a cross-layer concern, ensuring that normalization decisions are applied consistently across HTTP parsing, business logic, and SQL generation. This is particularly important for internationalized applications where diverse character sets are common.

Cockroachdb-Specific Remediation in Express — concrete code fixes

Apply Unicode normalization in Express before constructing SQL queries for Cockroachdb, using a consistent form such as NFC across all string handling. The following example demonstrates normalization at the middleware layer, ensuring that user input is standardized before any database operation.

const express = require('express');
const normalization = require('unorm');
const app = express();

app.use(express.json());

// Normalize incoming string fields to NFC
app.use((req, res, next) => {
  const normalizeFields = (obj) => {
    if (obj && typeof obj === 'object') {
      Object.keys(obj).forEach((key) => {
        if (typeof obj[key] === 'string') {
          obj[key] = normalization.nfc(obj[key]);
        } else if (Array.isArray(obj[key])) {
          obj[key] = obj[key].map((item) =>
            typeof item === 'string' ? normalization.nfc(item) : item
          );
        } else {
          normalizeFields(obj[key]);
        }
      });
    }
  };
  normalizeFields(req.body);
  normalizeFields(req.query);
  normalizeFields(req.params);
  next();
});

app.get('/user/profile', async (req, res) => {
  const userId = normalization.nfc(req.query.id);
  const result = await pool.query(
    'SELECT id, username, email FROM users WHERE username = $1',
    [userId]
  );
  res.json(result.rows[0]);
});

For Cockroachdb-specific handling, ensure that string columns involved in comparisons or unique constraints are consistently normalized at write time. Use triggers or application logic to store a normalized version alongside the original input, allowing reliable equality checks while preserving the original representation for display purposes.

-- Cockroachdb: store normalized version for reliable comparison
CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  username_original TEXT NOT NULL,
  username_normalized TEXT NOT NULL UNIQUE,
  email_original TEXT NOT NULL,
  email_normalized TEXT NOT NULL UNIQUE
);

-- Express insert with dual storage
app.post('/register', async (req, res) => {
  const usernameOriginal = req.body.username;
  const emailOriginal = req.body.email;
  const usernameNormalized = normalization.nfc(usernameOriginal);
  const emailNormalized = normalization.nfc(emailOriginal);

  const result = await pool.query(
    'INSERT INTO users (username_original, username_normalized, email_original, email_normalized) VALUES ($1, $2, $3, $4) RETURNING id',
    [usernameOriginal, usernameNormalized, emailOriginal, emailNormalized]
  );
  res.status(201).json({ id: result.rows[0].id });
});

-- Query using normalized value
app.post('/login', async (req, res) => {
  const usernameNormalized = normalization.nfc(req.body.username);
  const result = await pool.query(
    'SELECT id, username_original FROM users WHERE username_normalized = $1',
    [usernameNormalized]
  );
  if (result.rows.length === 0) {
    return res.status(401).json({ error: 'Invalid credentials' });
  }
  res.json({ user: result.rows[0] });
});

When using ORMs or query builders with Cockroachdb, normalize values at the point of parameter binding rather than modifying schema definitions. This approach maintains compatibility with existing data while preventing comparison mismatches. For endpoints that accept search terms, normalize both the query parameter and the stored values during comparison to ensure predictable behavior across all Unicode inputs.

Frequently Asked Questions

Does middleBrick detect Unicode normalization issues during API scanning?
middleBrick's input validation and property authorization checks can identify inconsistent string handling patterns that may indicate normalization vulnerabilities, though specific normalization testing requires application-level implementation.
Can the Express middleware approach conflict with OpenAPI spec validation?
Normalization middleware should be placed after JSON parsing but before validation to ensure consistent string representation. This maintains compatibility with OpenAPI schema expectations while preventing normalization-based bypasses.