Denial Of Service in Flask with Cockroachdb
Denial Of Service in Flask with Cockroachdb — how this specific combination creates or exposes the vulnerability
A Denial of Service (DoS) risk in a Flask application using CockroachDB typically arises from how the application handles database connections, queries, and retries under load. CockroachDB is designed for high availability and resilience, but client-side interaction patterns in Flask can create bottlenecks that degrade availability.
When Flask routes perform long-running or unbounded SQL queries without timeouts, a single heavy query can occupy a worker thread or process, preventing other requests from being served. In a threaded or synchronous deployment, this can exhaust the connection pool or worker capacity, leading to request queuing and timeouts. CockroachDB drivers for Python (such as psycopg2 for the PostgreSQL wire protocol or cockroachdb-python) support query timeouts and context-based cancellation; failing to use these can turn a slow query into a service-impacting event.
Another common pattern is unbounded retry logic on connection errors. If Flask retries failed database operations aggressively without backoff or circuit-breaking, a transient network glitch or a CockroachDB node restart can trigger a retry storm, amplifying load and worsening the outage. CockroachDB’s multi-node architecture means network partitions or node liveness events can cause temporary unavailability of certain ranges; without proper timeout and retry configuration in Flask, clients may block indefinitely waiting for a response.
Additionally, missing request-level timeouts in Flask can allow clients to hold connections open while waiting for database results. If CockroachDB is under contention or experiencing GC pressure, queries may take longer, and without a client-side timeout, Flask connections can be exhausted, causing new requests to fail. This is especially impactful when using a connection pool without sensible limits, as the pool can become saturated by in-flight queries that never complete.
Middleware and instrumentation can also affect DoS exposure. For example, logging every query synchronously or attaching large request/response payloads to Flask’s g object can increase memory pressure and slow request processing. When combined with CockroachDB’s strong consistency guarantees, high contention on frequently updated rows can lead to increased latency, which propagates into higher request latency and potential timeouts in Flask.
Cockroachdb-Specific Remediation in Flask — concrete code fixes
Apply explicit timeouts, context management, and controlled retries to prevent Flask from amplifying delays from CockroachDB. Below are concrete, working patterns for a Flask route using CockroachDB via the psycopg2-compatible driver.
Example 1: Safe query with timeout and cancellation
from flask import Flask, jsonify, g
import psycopg2
from psycopg2 import sql
import time
app = Flask(__name__)
# Configure a connection pool or factory as appropriate
def get_db():
# In production, use a pool (e.g., psycopg2 pool or a wrapper)
conn = psycopg2.connect(
host='cockroachdb-host',
port=26257,
database='mydb',
user='myuser',
password='secret',
connect_timeout=10,
keepalives=1,
keepalives_idle=30,
keepalives_interval=10,
keepalives_count=5,
)
return conn
@app.route('/user/')
def get_user(user_id):
conn = None
try:
conn = get_db()
conn.set_session(autocommit=True)
with conn.cursor() as cur:
# Enforce a query timeout at the session level
cur.execute("SET statement_timeout = '5s'")
# Use a context with a timeout for extra safety
start = time.time()
cur.execute(sql.SQL("SELECT id, name, email FROM users WHERE id = %s"), (user_id,))
row = cur.fetchone()
elapsed = time.time() - start
if row is None:
return jsonify({'error': 'not found'}), 404
return jsonify({'id': row[0], 'name': row[1], 'email': row[2], 'db_time_ms': round(elapsed * 1000, 2)})
except psycopg2.OperationalError as e:
# Handle connection issues gracefully without aggressive retries here
return jsonify({'error': 'database unavailable'}), 503
except psycopg2.Error as e:
return jsonify({'error': 'query failed', 'details': str(e)}), 502
finally:
if conn:
conn.close()
Example 2: Controlled retries with exponential backoff and circuit breaker basics
import time
from flask import jsonify
import psycopg2
from psycopg2 import OperationalError
MAX_RETRIES = 3
INITIAL_BACKOFF = 0.1 # seconds
def execute_with_retry(query, params):
backoff = INITIAL_BACKOFF
for attempt in range(1, MAX_RETRIES + 1):
conn = None
try:
conn = get_db()
with conn.cursor() as cur:
cur.execute(query, params)
return cur.fetchall()
except OperationalError as e:
if attempt == MAX_RETRIES:
raise
# Simple backoff; avoid retrying on permanent errors in production
time.sleep(backoff)
backoff *= 2
finally:
if conn:
conn.close()
raise RuntimeError('unreachable')
@app.route('/orders')
def list_orders():
try:
rows = execute_with_retry("SELECT id, total FROM orders ORDER BY created_at DESC LIMIT 100", ())
return jsonify([dict(id=r[0], total=r[1]) for r in rows])
except Exception:
return jsonify({'error': 'service unavailable'}), 503
Key remediation practices
- Set statement and query timeouts to bound execution time (e.g.,
SET statement_timeout = '5s'). - Use connection pool settings that limit open connections and prevent resource exhaustion.
- Implement bounded retries with exponential backoff and avoid retrying on client errors (4xx).
- Use context managers (
with) for cursors and ensure connections are closed infinallyblocks. - Monitor query latency and CockroachDB health to detect contention or node issues early.
Related CWEs: resourceConsumption
| CWE ID | Name | Severity |
|---|---|---|
| CWE-400 | Uncontrolled Resource Consumption | HIGH |
| CWE-770 | Allocation of Resources Without Limits | MEDIUM |
| CWE-799 | Improper Control of Interaction Frequency | MEDIUM |
| CWE-835 | Infinite Loop | HIGH |
| CWE-1050 | Excessive Platform Resource Consumption | MEDIUM |