Regex Dos in Django with Dynamodb
Regex Dos in Django with Dynamodb — how this specific combination creates or exposes the vulnerability
Regular expressions can become a vector for denial-of-service (ReDoS) when patterns are non-anchored and allow exponential backtracking on untrusted input. In Django applications that use Amazon DynamoDB as a backend, combining complex regex validation in Python with DynamoDB query patterns can amplify risk if input is forwarded to or derived from database attributes.
Consider a Django view that retrieves a user record from DynamoDB and then validates a filter parameter with a vulnerable pattern:
import re
from django.http import JsonResponse
import boto3
def search_view(request):
term = request.GET.get('q', '')
# Potentially dangerous regex applied to user-controlled input
if not re.match(r'^(a+)+$', term):
return JsonResponse({'error': 'invalid'}, status=400)
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Items')
response = table.scan(FilterExpression=boto3.dynamodb.conditions.Attr('name').contains(term))
return JsonResponse({'results': response.get('Items', [])})
The regex (a+)+$ is vulnerable to catastrophic backtracking on strings like "a" * 25 + "!". Even though DynamoDB performs the scan, the application layer still evaluates the regex on attacker-controlled input before issuing the request. This keeps the ReDoS risk in the Django process, independent of DynamoDB’s performance. An attacker can tie up worker processes with a single crafted request, leading to elevated latency or service unavailability.
Another scenario arises when a DynamoDB attribute is used directly in regex operations. For example, if your table stores user-supplied patterns or free-text fields, retrieving those values into Django and applying additional regex parsing can compound the issue:
def process_item(request, item_id):
table = boto3.resource('dynamodb').Table('Config')
resp = table.get_item(Key={'id': item_id})
item = resp.get('Item', {})
pattern = item.get('regex_pattern', '')
user_input = request.GET.get('data', '')
# Using a DynamoDB-stored pattern on untrusted input increases blast radius
if re.fullmatch(pattern, user_input):
return JsonResponse({'match': True})
return JsonResponse({'match': False})
Here, the pattern itself is stored in DynamoDB and may have been created without regex safety considerations. If an attacker can influence or poison that stored pattern, they can cause the Django app to exhibit ReDoS when evaluating subsequent requests. Because DynamoDB does not validate regex patterns, the responsibility shifts entirely to the application to ensure patterns are safe and to avoid using untrusted input as regex source.
The interaction between Django’s request/response cycle and DynamoDB’s eventually consistent reads does not mitigate ReDoS; the expensive regex work still occurs synchronously in the Django process. Therefore, the primary attack surface is user input that reaches regex engines, regardless of whether DynamoDB is the persistent store. Mitigations must focus on regex construction, input constraints, and runtime protections to prevent pathological execution paths.
Dynamodb-Specific Remediation in Django — concrete code fixes
To reduce ReDoS risk when using Django with DynamoDB, validate and constrain input before regex evaluation and avoid using untrusted data as regex patterns. Prefer bounded, safe patterns and perform length and structure checks.
Replace vulnerable patterns like (a+)+$ with atomic groups or non-backtracking constructs. For the earlier example, a safe alternative is:
def safe_search_view(request):
term = request.GET.get('q', '')
# Safe: bounded repetition, no nested quantifiers
if not re.fullmatch(r'a{0,100}', term):
return JsonResponse({'error': 'invalid'}, status=400)
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Items')
response = table.scan(FilterExpression=boto3.dynamodb.conditions.Attr('name').contains(term))
return JsonResponse({'results': response.get('Items', [])})
If you must use dynamic patterns sourced from DynamoDB, enforce strict allowlists and avoid user-controlled quantifiers:
import re
def safe_pattern_match(pattern_str, user_input):
# Allow only lowercase alphanumeric patterns; reject metacharrors that enable nesting
if not re.fullmatch(r'[a-z0-9|&() ]+', pattern_str):
raise ValueError('unsafe pattern')
try:
compiled = re.compile(pattern_str, re.IGNORECASE)
except re.error:
raise ValueError('invalid regex')
return bool(compiled.fullmatch(user_input))
def use_dynamodb_pattern(request):
table = boto3.resource('dynamodb').Table('Config')
resp = table.get_item(Key={'id': 'safe_pattern'})
pattern = resp.get('Item', {}).get('regex_pattern', '')
user_data = request.GET.get('data', '')
if safe_pattern_match(pattern, user_data):
return JsonResponse({'match': True})
return JsonResponse({'match': False})
Additionally, limit input length and complexity at the Django view layer to reduce the chance of pathological behavior:
def length_limited_view(request):
term = request.GET.get('q', '')
if len(term) > 200:
return JsonResponse({'error': 'too long'}, status=400)
# Further validation as needed
...
For DynamoDB operations, ensure filters and expressions avoid server-side regex-like behavior and rely on bounded string operations. Use ProjectionExpression to limit returned attributes and keep processing predictable. Combining input validation, pattern allowlists, and bounded repetition in Django provides robust protection while continuing to use DynamoDB for storage and retrieval.
Related CWEs: inputValidation
| CWE ID | Name | Severity |
|---|---|---|
| CWE-20 | Improper Input Validation | HIGH |
| CWE-22 | Path Traversal | HIGH |
| CWE-74 | Injection | CRITICAL |
| CWE-77 | Command Injection | CRITICAL |
| CWE-78 | OS Command Injection | CRITICAL |
| CWE-79 | Cross-site Scripting (XSS) | HIGH |
| CWE-89 | SQL Injection | CRITICAL |
| CWE-90 | LDAP Injection | HIGH |
| CWE-91 | XML Injection | HIGH |
| CWE-94 | Code Injection | CRITICAL |