Regex Dos in Django with Firestore
Regex Dos in Django with Firestore — how this specific combination creates or exposes the vulnerability
A Regular Expression Denial of Service (ReDoS) occurs when a poorly crafted regular expression exhibits extreme backtracking on certain inputs, causing CPU time to spike and making the application unresponsive. In Django applications that use Google Cloud Firestore as a backend, the interaction between Django request handling and Firestore document IDs or query parameters can unintentionally amplify this risk.
Django often uses path converters and custom validators to capture and validate URL parameters before they reach views. If a developer uses a non-anchored, repetitive regex to validate Firestore document IDs or string fields that are later used in queries, an attacker can supply crafted input that forces catastrophic backtracking. This typically happens when a pattern includes nested quantifiers (for example, (a+)+) applied to user-controlled data that may match in many overlapping ways.
Because Firestore queries in Django are commonly built from validated parameters—such as project IDs, collection names, or document key strings—regex-based validation becomes a chokepoint. An attacker can send many requests with specially designed strings that pass Django’s regex check but require exponential time to evaluate. Even though Firestore itself does not perform regex evaluation, the CPU cost is incurred entirely within Django before any Firestore call is made, leading to resource exhaustion and degraded service for all users.
Moreover, if Firestore document IDs contain characters that interact with regex character classes (such as hyphens, underscores, or numeric digits), a regex that is permissive by design may unintentionally allow ambiguous matching paths. For example, allowing optional segments or overlapping character classes increases the number of ways the engine can attempt to match, which raises the chance of exponential behavior. Developers might assume that because Firestore IDs are alphanumeric and structured, simple regexes are safe; however, without careful constraints, even patterns like ^[a-zA-Z0-9_-]+$ can become dangerous when combined with adjacent quantifiers or repeated groups in broader validation patterns.
Another subtle risk arises when Django serializes Firestore query results and applies regex-based sanitization or masking on string fields before returning responses. If the regex is applied to large text fields that may contain repetitive structures, an attacker-controlled payload stored in Firestore can trigger ReDoS during serialization. This means the vulnerability is not only present in incoming request validation but also in outbound data processing, making defense-in-depth essential.
Firestore-Specific Remediation in Django — concrete code fixes
Mitigating Regex Dos in Django with Firestore centers on writing precise, non-backtracking regular expressions and avoiding regex entirely where simpler string operations suffice. Below are concrete, realistic code examples that demonstrate secure patterns and safe integration with Firestore in Django.
1. Use strict regex patterns with anchors and bounded quantifiers
Always anchor patterns and avoid nested quantifiers. For Firestore document IDs, which are typically alphanumeric strings with hyphens, prefer explicit length or character class limits.
import re
# Unsafe: vulnerable to ReDoS due to nested quantifier
# BAD_PATTERN = re.compile(r'^(a+)+$')
# Safe: anchored, no nested quantifiers, explicit allowed characters
SAFE_DOC_ID_PATTERN = re.compile(r'^[A-Za-z0-9_-]{1,100}$') # max length prevents runaway matching
def is_valid_document_id(candidate: str) -> bool:
return bool(SAFE_DOC_ID_PATTERN.fullmatch(candidate))
2. Prefer string methods over regex when possible
For simple prefix/suffix checks or exact matches, use Python string operations which run in linear time and avoid backtracking entirely.
def is_valid_namespace(value: str) -> bool:
# Firestore collection names should be simple alphanumeric with underscores
return value.isidentifier() and not value.startswith('__')
3. Validate Firestore query parameters safely
When constructing queries from validated inputs, ensure that parameter validation uses safe patterns and that you avoid concatenating raw user input into regex-based filters.
from google.cloud import firestore
import re
SAFE_FILTER_PATTERN = re.compile(r'^[A-Za-z0-9 ]{1,200}$')
def get_user_documents(user_token: str):
if not SAFE_FILTER_PATTERN.fullmatch(user_token):
raise ValueError('Invalid token format')
db = firestore.Client()
# Safe: using parameterized queries with validated input
docs = db.collection('users').where('auth_token', '==', user_token).limit(10).stream()
return [doc.to_dict() for doc in docs]
4. Use middleware to reject problematic paths early
In Django, add a lightweight middleware that validates path parameters using strict patterns before they reach view logic. This prevents malicious payloads from consuming CPU cycles in business logic.
import re
from django.http import HttpResponseBadRequest
url_path_pattern = re.compile(r'^/api/v1/projects/[A-Za-z0-9_-]{1,64}/datasets/[A-Za-z0-9_-]{1,64}/$')
class FirestorePathValidationMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
if not url_path_pattern.fullmatch(request.path):
return HttpResponseBadRequest('Invalid path format')
response = self.get_response(request)
return response
5. Limit regex use in serialization and masking
If you must sanitize string fields from Firestore, prefer simple replacements or length-limited patterns to avoid processing large repetitive content.
def safe_mask(value: str) -> str:
# Avoid complex regex on large text; use simple truncation and hashing for display
if len(value) > 100:
return value[:50] + '...' + hash(value)[-10:]
return valueRelated CWEs: inputValidation
| CWE ID | Name | Severity |
|---|---|---|
| CWE-20 | Improper Input Validation | HIGH |
| CWE-22 | Path Traversal | HIGH |
| CWE-74 | Injection | CRITICAL |
| CWE-77 | Command Injection | CRITICAL |
| CWE-78 | OS Command Injection | CRITICAL |
| CWE-79 | Cross-site Scripting (XSS) | HIGH |
| CWE-89 | SQL Injection | CRITICAL |
| CWE-90 | LDAP Injection | HIGH |
| CWE-91 | XML Injection | HIGH |
| CWE-94 | Code Injection | CRITICAL |