HIGH regex dosdjangofirestore

Regex Dos in Django with Firestore

Regex Dos in Django with Firestore — how this specific combination creates or exposes the vulnerability

A Regular Expression Denial of Service (ReDoS) occurs when a poorly crafted regular expression exhibits extreme backtracking on certain inputs, causing CPU time to spike and making the application unresponsive. In Django applications that use Google Cloud Firestore as a backend, the interaction between Django request handling and Firestore document IDs or query parameters can unintentionally amplify this risk.

Django often uses path converters and custom validators to capture and validate URL parameters before they reach views. If a developer uses a non-anchored, repetitive regex to validate Firestore document IDs or string fields that are later used in queries, an attacker can supply crafted input that forces catastrophic backtracking. This typically happens when a pattern includes nested quantifiers (for example, (a+)+) applied to user-controlled data that may match in many overlapping ways.

Because Firestore queries in Django are commonly built from validated parameters—such as project IDs, collection names, or document key strings—regex-based validation becomes a chokepoint. An attacker can send many requests with specially designed strings that pass Django’s regex check but require exponential time to evaluate. Even though Firestore itself does not perform regex evaluation, the CPU cost is incurred entirely within Django before any Firestore call is made, leading to resource exhaustion and degraded service for all users.

Moreover, if Firestore document IDs contain characters that interact with regex character classes (such as hyphens, underscores, or numeric digits), a regex that is permissive by design may unintentionally allow ambiguous matching paths. For example, allowing optional segments or overlapping character classes increases the number of ways the engine can attempt to match, which raises the chance of exponential behavior. Developers might assume that because Firestore IDs are alphanumeric and structured, simple regexes are safe; however, without careful constraints, even patterns like ^[a-zA-Z0-9_-]+$ can become dangerous when combined with adjacent quantifiers or repeated groups in broader validation patterns.

Another subtle risk arises when Django serializes Firestore query results and applies regex-based sanitization or masking on string fields before returning responses. If the regex is applied to large text fields that may contain repetitive structures, an attacker-controlled payload stored in Firestore can trigger ReDoS during serialization. This means the vulnerability is not only present in incoming request validation but also in outbound data processing, making defense-in-depth essential.

Firestore-Specific Remediation in Django — concrete code fixes

Mitigating Regex Dos in Django with Firestore centers on writing precise, non-backtracking regular expressions and avoiding regex entirely where simpler string operations suffice. Below are concrete, realistic code examples that demonstrate secure patterns and safe integration with Firestore in Django.

1. Use strict regex patterns with anchors and bounded quantifiers

Always anchor patterns and avoid nested quantifiers. For Firestore document IDs, which are typically alphanumeric strings with hyphens, prefer explicit length or character class limits.

import re

# Unsafe: vulnerable to ReDoS due to nested quantifier
# BAD_PATTERN = re.compile(r'^(a+)+$')

# Safe: anchored, no nested quantifiers, explicit allowed characters
SAFE_DOC_ID_PATTERN = re.compile(r'^[A-Za-z0-9_-]{1,100}$')  # max length prevents runaway matching

def is_valid_document_id(candidate: str) -> bool:
    return bool(SAFE_DOC_ID_PATTERN.fullmatch(candidate))

2. Prefer string methods over regex when possible

For simple prefix/suffix checks or exact matches, use Python string operations which run in linear time and avoid backtracking entirely.

def is_valid_namespace(value: str) -> bool:
    # Firestore collection names should be simple alphanumeric with underscores
    return value.isidentifier() and not value.startswith('__')

3. Validate Firestore query parameters safely

When constructing queries from validated inputs, ensure that parameter validation uses safe patterns and that you avoid concatenating raw user input into regex-based filters.

from google.cloud import firestore
import re

SAFE_FILTER_PATTERN = re.compile(r'^[A-Za-z0-9 ]{1,200}$')

def get_user_documents(user_token: str):
    if not SAFE_FILTER_PATTERN.fullmatch(user_token):
        raise ValueError('Invalid token format')
    
    db = firestore.Client()
    # Safe: using parameterized queries with validated input
    docs = db.collection('users').where('auth_token', '==', user_token).limit(10).stream()
    return [doc.to_dict() for doc in docs]

4. Use middleware to reject problematic paths early

In Django, add a lightweight middleware that validates path parameters using strict patterns before they reach view logic. This prevents malicious payloads from consuming CPU cycles in business logic.

import re
from django.http import HttpResponseBadRequest

url_path_pattern = re.compile(r'^/api/v1/projects/[A-Za-z0-9_-]{1,64}/datasets/[A-Za-z0-9_-]{1,64}/$')

class FirestorePathValidationMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        if not url_path_pattern.fullmatch(request.path):
            return HttpResponseBadRequest('Invalid path format')
        response = self.get_response(request)
        return response

5. Limit regex use in serialization and masking

If you must sanitize string fields from Firestore, prefer simple replacements or length-limited patterns to avoid processing large repetitive content.

def safe_mask(value: str) -> str:
    # Avoid complex regex on large text; use simple truncation and hashing for display
    if len(value) > 100:
        return value[:50] + '...' + hash(value)[-10:]
    return value

Related CWEs: inputValidation

CWE IDNameSeverity
CWE-20Improper Input Validation HIGH
CWE-22Path Traversal HIGH
CWE-74Injection CRITICAL
CWE-77Command Injection CRITICAL
CWE-78OS Command Injection CRITICAL
CWE-79Cross-site Scripting (XSS) HIGH
CWE-89SQL Injection CRITICAL
CWE-90LDAP Injection HIGH
CWE-91XML Injection HIGH
CWE-94Code Injection CRITICAL

Frequently Asked Questions

Can Firestore document IDs themselves trigger ReDoS in Django?
Firestore document IDs are alphanumeric strings with hyphens and underscores. They do not execute regexes themselves, but if Django uses unsafe regex patterns to validate or parse these IDs, the CPU cost of backtracking occurs in Django, not Firestore. Therefore, secure regex validation on the server side is essential.
Does using Firestore's built-in security rules prevent Regex Dos attacks?
No. Firestore security rules govern data access and validation at the database level, but Regex Dos is an application-layer CPU exhaustion issue that occurs in Django code before any Firestore interaction. Defense must be implemented in Django input validation and request handling.