Insecure Deserialization in Flask
How Insecure Deserialization Manifests in Flask
Insecure deserialization in Flask applications often occurs when developers use Python's pickle module or similar serialization libraries to reconstruct objects from user-controlled data without proper validation. A common pattern is accepting serialized data via HTTP request bodies, headers, or cookies and deserializing it directly. For example, a Flask endpoint might accept a pickle-encoded object in a custom header or JSON field to restore user session state or configuration.
Consider this vulnerable Flask route:
import pickle
from flask import Flask, request, Response
app = Flask(__name__)
@app.route('/process', methods=['POST'])
def process_data():
serialized_data = request.headers.get('X-Data') or request.data
if not serialized_data:
return Response('No data provided', status=400)
try:
# Vulnerable: deserializing untrusted input
obj = pickle.loads(serialized_data)
return Response(f'Processed: {obj}', mimetype='text/plain')
except Exception as e:
return Response(f'Error: {str(e)}', status=500)
if __name__ == '__main__':
app.run()
An attacker can exploit this by sending a malicious pickle payload that executes arbitrary code during deserialization. For instance, using the __reduce__ method to invoke os.system:
import pickle
import os
class Exploit:
def __reduce__(self):
return (os.system, ('id > /tmp/exploit',))
payload = pickle.dumps(Exploit())
# Send payload in X-Data header or POST body to /process endpoint
This could lead to remote code execution (RCE), compromising the server. Similar risks exist with yaml.load (without Loader=yaml.SafeLoader) or jsonpickle if misconfigured. These flaws map to OWASP API Security Top 10:2023 A8:2023 – Security Misconfiguration and A3:2023 – Injection, as deserialization flaws often enable injection attacks. Real-world parallels include CVE-2020-14145 in Apache Geode and CVE-2013-2165 in Ruby on Rails, where insecure deserialization led to RCE.
Flask-Specific Detection
Detecting insecure deserialization in Flask requires analyzing both code patterns and runtime behavior. Static analysis can identify dangerous functions like pickle.loads, yaml.load (unsafe), marshal.loads, or dill.loads used with user-controlled inputs from request.data, request.headers, request.cookies, or request.form. However, false positives are common if the data is validated or sanitized beforehand.
middleBrick identifies these issues through black-box testing by sending serialized attack payloads to endpoints and monitoring for signs of successful exploitation. It does not require source code or agents — only the API URL. For Flask applications, middleBrick tests for:
- Pickle-based RCE via
__reduce__chains - YAML deserialization leading to code execution (e.g., using
!!python/object/apply:os.system) - JSON pickle exploitation if
jsonpickleis used without restrictions
For example, middleBrick might send a pickle payload that attempts to exfiltrate data via DNS or HTTP callback (similar to CVE-2022-24715 in requests library, though not Flask-specific, the technique applies). If the server responds with unexpected behavior — such as delayed responses indicating command execution, error messages revealing internal state, or out-of-band interactions — middleBrick flags the endpoint as vulnerable.
Additionally, middleBrick cross-references OpenAPI specifications (if available) to identify endpoints accepting binary or structured data (e.g., application/octet-stream, application/x-python-pickle, or custom content types) where deserialization is likely. It prioritizes findings by severity: confirmed RCE attempts are marked critical, while potential data exposure via unsafe deserialization (e.g., object modification without code execution) may be medium or high.
This approach aligns with middleBrick’s 5–15 second scan time and unauthenticated black-box methodology, providing actionable findings without internal access.
Flask-Specific Remediation
The primary defense against insecure deserialization in Flask is to avoid deserializing untrusted data altogether. When unavoidable, use strict validation, signing, or safe deserialization methods.
1. Avoid pickle for untrusted data
Replace pickle with JSON or MessagePack for data interchange. If object serialization is necessary, use itsdangerous (maintained by Pallets, Flask’s creator) to sign and verify data:
from itsdangerous import TimedSerializer, BadSignature
from flask import Flask, request, Response
app = Flask(__name__)
app.config['SECRET_KEY'] = 'your-secret-key-here'
serializer = TimedSerializer(app.config['SECRET_KEY'])
@app.route('/process', methods=['POST'])
def process_data():
token = request.headers.get('X-Data-Token')
if not token:
return Response('Missing token', status=400)
try:
# Safely deserialize and verify signature + expiration
data = serializer.loads(token, max_age=3600) # 1 hour expiry
# Process data (expected to be simple types like dict, str, int)
if not isinstance(data, dict) or 'user_id' not in data:
return Response('Invalid data structure', status=400)
return Response(f'Processed user {data["user_id"]}', mimetype='text/plain')
except BadSignature:
return Response('Invalid or expired token', status=400)
except Exception as e:
return Response(f'Error: {str(e)}', status=500)
if __name__ == '__main__':
app.run()
2. Use safe YAML loading
If YAML is required, always use yaml.safe_load:
import yaml
from flask import Flask, request
app = Flask(__name__)
@app.route('/config', methods=['POST'])
def update_config():
yaml_data = request.data
try:
# Safe: only loads standard YAML types, no arbitrary objects
config = yaml.safe_load(yaml_data)
# Process config...
return {'status': 'success'}
except yaml.YAMLError as e:
return {'error': str(e)}, 400
if __name__ == '__main__':
app.run()
3. Implement input validation and allowlisting
For any deserialization, validate the resulting object against a strict schema. Use libraries like pydantic or marshmallow to ensure data conforms to expected types and structure.
4. Use middleware or decorators for centralized protection Create a Flask decorator to verify signed tokens before deserialization:
from functools import wraps
from itsdangerous import TimedSerializer, BadSignature
serializer = TimedSerializer('your-secret-key')
def require_signed_data(f):
@wraps(f)
def decorated(*args, **kwargs):
token = request.headers.get('X-Data-Token') or request.args.get('token')
if not token:
return Response('Missing token', status=400)
try:
request.deserialized_data = serializer.loads(token, max_age=300)
except BadSignature:
return Response('Invalid token', status=400)
return f(*args, **kwargs)
return decorated
@app.route('/secure-process', methods=['POST'])
@require_signed_data
def secure_process():
# request.deserialized_data is guaranteed to be verified
user_id = request.deserialized_data.get('user_id')
return f'User {user_id} processed'
if __name__ == '__main__':
app.run()
These practices eliminate the root cause. middleBrick validates fixes by rescanning the endpoint; if the same payloads no longer trigger exploitative behavior, the vulnerability is marked as resolved in subsequent reports.
Frequently Asked Questions
Can middleBrick detect insecure deserialization in Flask apps that use custom serialization formats?
Is it ever safe to use pickle.loads with user input in a Flask app if I validate the input first?
pickle.loads on untrusted data is inherently unsafe. Instead, avoid pickle entirely for user-facing data and use signed, safe serialization methods like itsdangerous or JSON with strict schema validation.