MEDIUM unicode normalizationgin

Unicode Normalization in Gin

How Unicode Normalization Manifests in Gin

Unicode normalization attacks in Gin applications typically exploit the framework's handling of HTTP request parameters and path variables. Gin's c.Param(), c.Query(), and c.PostForm() methods don't automatically normalize Unicode input, creating a window for attackers to bypass security controls.

A common manifestation is path traversal via Unicode normalization. An attacker might send a request to /api/v1/users/%C3%A9ric (precomposed e-acute) versus /api/v1/users/%65%2B%CC%81ric (e + combining acute accent). These represent the same visual character but different byte sequences. If your Gin handler uses these parameters for file access or database queries without normalization, you might have inconsistent behavior:

func getUser(c *gin.Context) {
    userID := c.Param("id")
    // Attacker can craft userID with different normalization forms
    // to bypass authorization checks or access unintended resources
    user, err := db.GetUserByID(userID)
    // ...
}

Authentication bypasses are another critical vector. If your Gin application uses username parameters for lookup, an attacker can craft usernames that normalize to valid accounts. For example, admin versus ädmin (a with diaeresis) might hash differently or match different database entries depending on your collation settings.

Rate limiting and API key validation are also vulnerable. A Gin middleware that tracks requests by API key without normalization allows the same logical key in different forms to count as separate entities, potentially bypassing rate limits entirely.

Gin-Specific Detection

Detecting Unicode normalization issues in Gin requires both static analysis and runtime testing. Start by examining your handler functions for direct parameter usage without validation. Look for patterns like:

// Vulnerable: direct parameter usage without normalization
func getDocument(c *gin.Context) {
    docID := c.Param("id")
    // No normalization before database lookup
    doc, err := storage.GetDocument(docID)
    // ...
}

middleBrick's black-box scanning can identify these vulnerabilities by sending requests with different Unicode normalization forms to the same endpoint and checking for inconsistent responses. The scanner tests both NFC (composed) and NFD (decomposed) forms, as well as NFKC and NFKD variants, to detect if your API treats logically identical inputs differently.

For comprehensive detection, implement logging that captures the raw request bytes and normalized forms. Compare how different normalization forms affect your application's behavior:

func normalizeInput(input string) string {
    return strings.ToLower(
        string(unicode.Normalize(unicode.NFC, []rune(input))),
    )
}

// Test with different forms
func testNormalization() {
    testCases := map[string]string{
        "NFC": "éric",
        "NFD": "éric",
        "NFKC": "A", // Full-width A
        "NFKD": "A",    // Compatibility decomposition
    }
    
    for form, input := range testCases {
        normalized := normalizeInput(input)
        // Check if different forms produce same normalized output
        // If not, you have a normalization vulnerability
    }
}

middleBrick's API security scanning automatically tests these normalization vectors across all 12 security categories, including authentication bypasses and authorization flaws that might result from inconsistent Unicode handling.

Gin-Specific Remediation

The most effective remediation in Gin is to normalize all user input at the earliest possible point in your request handling pipeline. Create a middleware that normalizes query parameters, path variables, and request bodies before they reach your handlers:

func unicodeNormalizationMiddleware() gin.HandlerFunc {
    return func(c *gin.Context) {
        // Normalize path parameters
        for param := range c.Params {
            c.Params[param].Value = normalizeString(c.Params[param].Value)
        }
        
        // Normalize query parameters
        query := c.Request.URL.Query()
        for key, values := range query {
            for i, val := range values {
                query.Set(key, normalizeString(val))
            }
        }
        c.Request.URL.RawQuery = query.Encode()
        
        // Normalize JSON body if present
        if c.ContentType() == "application/json" {
            var body map[string]interface{}
            if err := c.ShouldBindJSON(&body); err == nil {
                normalizedBody := normalizeMap(body)
                // Re-encode normalized body
                jsonBody, _ := json.Marshal(normalizedBody)
                c.Request.Body = io.NopCloser(bytes.NewBuffer(jsonBody))
                c.Request.ContentLength = int64(len(jsonBody))
                c.Request = c.Request.WithContext(context.WithValue(
                    c.Request.Context(), "normalizedBody", normalizedBody))
            }
        }
        
        c.Next()
    }
}

func normalizeString(s string) string {
    return string(unicode.Normalize(unicode.NFC, []rune(s)))
}

func normalizeMap(m map[string]interface{}) map[string]interface{} {
    normalized := make(map[string]interface{})
    for k, v := range m {
        normalized[normalizeString(k)] = normalizeValue(v)
    }
    return normalized
}

func normalizeValue(v interface{}) interface{} {
    switch val := v.(type) {
    case string:
        return normalizeString(val)
    case map[string]interface{}:
        return normalizeMap(val)
    case []interface{}:
        for i, item := range val {
            val[i] = normalizeValue(item)
        }
        return val
    default:
        return v
    }
}

// Use in main.go
func main() {
    r := gin.New()
    r.Use(unicodeNormalizationMiddleware())
    // ... routes
}

For database queries, ensure your ORM or database client uses consistent collation. With GORM, you can set the collation at the connection level or use binary collation for exact byte matching when needed:

dsn := "user:password@/dbname?charset=utf8mb4&collation=utf8mb4_unicode_ci"
db, err := gorm.Open(mysql.Open(dsn), &gorm.Config{})

// Or for exact byte matching when required
db.Exec("SET NAMES utf8mb4 COLLATE utf8mb4_bin")

Always validate and sanitize input after normalization. Use libraries like github.com/go-playground/validator to enforce expected formats, and consider implementing a whitelist approach for known-good characters in critical parameters.

Frequently Asked Questions

How does middleBrick detect Unicode normalization vulnerabilities in my Gin API?
middleBrick's black-box scanner sends requests with different Unicode normalization forms (NFC, NFD, NFKC, NFKD) to the same endpoint and analyzes whether the API treats logically identical inputs differently. The scanner checks for inconsistent responses, authentication bypasses, and authorization flaws that could result from improper Unicode handling. No credentials or setup required—just provide your API URL and middleBrick tests these vectors automatically.
Should I normalize to NFC or NFD in my Gin middleware?
NFC (composed) is generally the safer choice for web applications because it's the most common form used in modern systems and URLs. NFD (decomposed) can cause issues with certain protocols and storage systems. The key is consistency—normalize to one form and stick with it throughout your application stack. middleBrick's scanning helps verify that your chosen normalization strategy is applied consistently across all API endpoints.