HIGH prompt injectionflaskdynamodb

Prompt Injection in Flask with Dynamodb

Prompt Injection in Flask with Dynamodb — how this specific combination creates or exposes the vulnerability

Prompt injection becomes a practical concern when an LLM endpoint is built with a web framework such as Flask and relies on DynamoDB for data access. In this stack, Flask routes often construct and forward user-influenced input directly into LLM prompts, and DynamoDB is used to retrieve documents, configuration, or user context. If user input is concatenated into a system or user message without validation or isolation, an attacker can inject instructions that alter the LLM behavior. At the same time, DynamoDB patterns common in Flask services—such as building query expressions from request parameters—can expose metadata or error paths that aid prompt injection or information leakage.

Consider a Flask endpoint that retrieves a user’s stored instruction profile from DynamoDB and then asks an LLM to summarize content in the style of that profile. If the profile is fetched by a simple key (e.g., user_id from a token) and then embedded into the prompt as raw text, an authenticated attacker who can control the profile may store crafted instructions that cause the LLM to ignore safety guidelines or exfiltrate data. More broadly, prompt injection in this combination is not only about the LLM; it is about how Flask request handling, DynamoDB access patterns, and LLM input construction interact. For example, insufficient input validation on query parameters used to select DynamoDB keys can allow an attacker to force reads of arbitrary items, which may then be included in prompts. Error messages from DynamoDB—such as conditional check failures or provisioned throughput exceptions—may also be surfaced to the LLM or returned directly to the client, providing clues for injection attempts.

The LLM/AI Security checks in middleBrick specifically target these cross-layer risks by detecting system prompt leakage and performing active prompt injection testing with sequential probes (system prompt extraction, instruction override, DAN jailbreak, data exfiltration, cost exploitation). When scanning a Flask service that uses DynamoDB, the scanner checks whether unauthentinated endpoints expose LLM interfaces, whether user-influenced data reaches prompts without sanitization, and whether DynamoDB interactions can be abused to influence prompt context. Findings include severity-ranked guidance to separate data retrieval from prompt construction, enforce strict allowlists on dynamic values, and avoid including raw user data or internal error details in LLM inputs.

Dynamodb-Specific Remediation in Flask — concrete code fixes

Remediation focuses on strict separation between data retrieval and prompt assembly, controlled exposure of data to the LLM, and safe handling of DynamoDB responses in Flask. Below are concrete, safe patterns you can adopt.

1. Parameterized DynamoDB access with strict key validation

Always validate and sanitize inputs used to construct DynamoDB key expressions. Do not concatenate user input into key names or conditional expressions. Use bounded allowlists for identifiers and treat user input as data, not code.

import boto3
from flask import Flask, request, jsonify
import re

app = Flask(__name__)
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('UserProfiles')

@app.route('/profile')
def get_profile():
    user_id = request.args.get('user_id')
    # Strict validation: alphanumeric and underscores, 3–64 chars
    if not re.match(r'^[A-Za-z0-9_]{3,64}$', user_id):
        return jsonify({'error': 'invalid user_id'}), 400
    response = table.get_item(Key={'user_id': user_id})
    item = response.get('Item')
    if not item:
        return jsonify({'error': 'not found'}), 404
    return jsonify({'profile': item.get('profile_text', '')})

2. Isolate data from prompts; do not embed raw user data

Retrieve data from DynamoDB for business logic or context, but do not directly embed potentially mutable fields into LLM prompts. Instead, pass only vetted, non-instructional fields and enforce a strict prompt template on the server side.

import boto3
from flask import Flask, jsonify

app = Flask(__name__)
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('UserPreferences')

@app.route('/summarize')
def summarize():
    user_id = request.args.get('user_id')
    if not re.match(r'^[A-Za-z0-9_]{3,64}$', user_id):
        return jsonify({'error': 'invalid user_id'}), 400
    pref = table.get_item(Key={'user_id': user_id}).get('Item', {})
    # Use only safe, non-instructional fields as context; do not embed raw profile_text
    safe_context = pref.get('preferred_language', 'English')
    # Server-side prompt template; user data is never part of the instruction
    prompt = f"Summarize the following text in {safe_context}. Text: {{user_text}}"
    # llm_response = call_llm(prompt, user_text=user_supplied_text)
    return jsonify({'prompt_template': prompt})

3. Handle DynamoDB errors safely and avoid leaking metadata to LLM

Ensure that DynamoDB exceptions do not become part of LLM input. Map errors to generic responses and log details securely for investigation. This prevents attackers from using error conditions to infer table structures or inject via error payloads.

import botocore
@app.route('/data')
def get_data():
    try:
        resp = table.get_item(Key={'id': request.args.get('id')})
    except botocore.exceptions.ClientError as e:
        # Log full exception internally; return generic message
        app.logger.error('dynamodb_error: %s', e.response['Error'])
        return jsonify({'error': 'internal error'}), 500
    item = resp.get('Item')
    if not item:
        return jsonify({'error': 'not found'}), 404
    return jsonify({'data': item})

4. Apply allowlists and schema checks on retrieved data

Even after retrieval, validate shapes and content of DynamoDB items before any optional use in prompts. Define a strict schema for items you expect to use, and reject items that deviate.

from jsonschema import validate

profile_schema = {
    'type': 'object',
    'properties': {
        'user_id': {'type': 'string', 'pattern': '^[A-Za-z0-9_]{3,64}$'},
        'profile_text': {'type': 'string', 'maxLength': 2000}
    },
    'required': ['user_id']
}

@app.route('/profile-safe')
def get_profile_safe():
    user_id = request.args.get('user_id')
    if not re.match(r'^[A-Za-z0-9_]{3,64}$', user_id):
        return jsonify({'error': 'invalid user_id'}), 400
    item = table.get_item(Key={'user_id': user_id}).get('Item')
    if not item:
        return jsonify({'error': 'not found'}), 404
    try:
        validate(instance=item, schema=profile_schema)
    except Exception:
        return jsonify({'error': 'invalid data'}), 500
    # Only pass explicitly allowed fields to LLM context
    return jsonify({'user_id': item['user_id']})

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

Can prompt injection via Flask and DynamoDB expose system prompts stored in the LLM?
Yes, if user-influenced data is concatenated into prompts or if error paths from DynamoDB are reflected in LLM inputs, an attacker can attempt system prompt extraction. Use strict input validation, isolate data from prompts, and avoid including raw user or error data in LLM requests.
Does middleBrick test for prompt injection when scanning a Flask API that uses DynamoDB?
Yes. middleBrick runs active prompt injection testing and system prompt leakage detection as part of its LLM/AI Security checks. It scans unauthenticated attack surfaces and reports findings with severity and remediation guidance, helping you identify whether user-controlled data can influence LLM behavior.