HIGH llm data leakageflaskdynamodb

Llm Data Leakage in Flask with Dynamodb

Llm Data Leakage in Flask with Dynamodb — how this specific combination creates or exposes the vulnerability

Llm Data Leakage occurs when an application exposes sensitive data or system prompts through responses generated by an LLM integration. In a Flask application that interacts with Amazon DynamoDB, the risk increases when LLM endpoints are reachable without authentication and when data retrieved from DynamoDB is passed into prompts or returned alongside LLM outputs.

Flask routes that query DynamoDB based on user input can inadvertently create conditions for prompt injection or data leakage if the data shapes LLM prompts or responses. For example, using raw DynamoDB results directly in system or user messages may expose data patterns, identifiers, or sensitive context to an attacker who can manipulate the input. An unauthenticated LLM endpoint in the same application surface allows an attacker to probe the model and observe whether responses reflect underlying DynamoDB content, such as table names, item attributes, or business logic encoded in prompts.

The combination introduces specific risks:

  • Data from DynamoDB may be included in prompts, enabling prompt injection attacks that extract or infer sensitive information through crafted inputs.
  • LLM responses may inadvertently include sensitive data retrieved from DynamoDB, such as personally identifiable information (PII) or credential material, if output scanning is not applied.
  • Unauthenticated LLM endpoints can be targeted to test for system prompt leakage, where patterns in DynamoDB-driven prompts reveal internal instructions or operational details.

An illustrative, insecure pattern in Flask might look like this, where user input directly influences a prompt that includes raw DynamoDB items:

from flask import Flask, request, jsonify
import boto3
import json

app = Flask(__name__)
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('UserProfiles')

@app.route('/chat', methods=['POST'])
def chat():
    user_id = request.json.get('user_id')
    response = table.get_item(Key={'user_id': user_id})
    item = response.get('Item', {})
    # Insecure: raw item injected into prompt
    prompt = f"You are a support agent. User details: {json.dumps(item)}. Answer the query."
    # Assume llm_response is obtained from an LLM endpoint
    llm_response = "Echo: " + prompt  # placeholder for actual LLM call
    return jsonify({'reply': llm_response})

if __name__ == '__main__':
    app.run()

In this scenario, an attacker who can control user_id may attempt to manipulate the prompt or observe whether the LLM echoes back sensitive fields from the DynamoDB item. If the LLM endpoint is unauthenticated, the attacker can also perform active prompt injection tests, such as attempting to extract the system prompt or exfiltrate data via crafted inputs that leverage the DynamoDB-supplied context.

middleBrick detects these risks under LLM/AI Security by checking for system prompt leakage patterns, active prompt injection probes, and output scanning for PII or secrets in LLM responses. When DynamoDB data flows into prompts or is reflected in outputs, the scanner highlights the exposure and provides remediation guidance to isolate data handling from LLM interactions.

Dynamodb-Specific Remediation in Flask — concrete code fixes

Remediation focuses on ensuring that data from DynamoDB is never directly exposed to LLM prompts or responses, and that sensitive fields are stripped or transformed before inclusion in any LLM-related context. Apply the principle of least privilege and data minimization when accessing DynamoDB, and treat LLM endpoints as untrusted surfaces.

Use parameterized queries and avoid including raw item attributes in prompts. Instead, extract only the necessary, non-sensitive fields and pass them through a sanitization layer. The following example demonstrates a secure pattern in Flask:

from flask import Flask, request, jsonify
import boto3
import json
import re

app = Flask(__name__)
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('UserProfiles')

def sanitize_for_prompt(value):
    # Remove or mask sensitive subfields
    if isinstance(value, dict):
        return {k: '***' if k in ('email', 'ssn', 'api_key') else sanitize_for_prompt(v) for k, v in value.items()}
    if isinstance(value, list):
        return [sanitize_for_prompt(v) for v in value]
    return value

@app.route('/chat', methods=['POST'])
def chat():
    user_id = request.json.get('user_id')
    response = table.get_item(Key={'user_id': user_id})
    item = response.get('Item', {})
    # Use only non-sensitive fields
    safe_display_name = item.get('display_name', 'Unknown')
    safe_role = item.get('role', 'user')
    sanitized = sanitize_for_prompt(item)
    # Build prompt without raw sensitive data
    prompt = (
        f"You are a support agent for role '{safe_role}'. "
        f"User display name: {safe_display_name}. "
        f"Do not reveal internal data."
    )
    # Assume llm_response is obtained from an LLM endpoint with input validation
    llm_response = "Acknowledged"  # placeholder for actual LLM call
    return jsonify({'reply': llm_response, 'context': sanitized})

if __name__ == '__main__':
    app.run()

Key practices:

  • Never include raw DynamoDB items in prompts. Select only required, non-sensitive fields.
  • Apply a sanitization function to mask or remove fields such as email, SSN, API keys, or other PII before any LLM interaction.
  • Ensure LLM endpoints are protected with authentication where possible, and avoid exposing unauthenticated endpoints that can be probed for prompt leakage.
  • Validate and limit user input used to query DynamoDB to prevent injection or enumeration attacks that could reveal data patterns.

middleBrick’s LLM/AI Security checks will flag scenarios where DynamoDB data appears in prompts or where output contains PII, API keys, or executable code, guiding you to apply these mitigations.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How can I prevent LLM output from exposing DynamoDB data?
Ensure LLM prompts do not include raw DynamoDB items. Use a sanitization step to remove or mask sensitive fields before constructing prompts, and enable output scanning to detect PII, API keys, or code in LLM responses.
Is it safe to use an unauthenticated LLM endpoint when my Flask app uses DynamoDB?
Unauthenticated LLM endpoints increase exposure to prompt injection and data leakage tests. If you must use them, ensure no sensitive DynamoDB data is included in prompts and apply strict input validation and output scanning.