Llm Data Leakage in Express with Dynamodb
Llm Data Leakage in Express with Dynamodb — how this specific combination creates or exposes the vulnerability
When an Express application serves an unauthenticated or insufficiently constrained LLM endpoint and also interacts with DynamoDB, the combination can unintentionally expose sensitive data through both the LLM output and the application’s data access patterns. In this scenario, the LLM may generate responses that contain data retrieved from DynamoDB, and the application may pass user-supplied input directly to both the LLM and the database without adequate validation or authorization checks.
LLM data leakage occurs when system prompts, training details, or private data stored in DynamoDB appear in LLM responses. For example, if the Express backend queries DynamoDB using an item identifier derived from user input and then includes the retrieved record in a prompt sent to an LLM without redaction, fields such as internal IDs, email addresses, or personal identifiers may be exposed. Because the scan includes unauthenticated LLM endpoint detection and active prompt injection testing, an endpoint that combines LLM calls with DynamoDB lookups is susceptible to extracting training or system information through crafted prompts, revealing not only the existence of the data but its contents.
DynamoDB-specific risks in this context include over-permissive IAM policies attached to the application runtime, use of unencrypted tables, and improper key design that exposes relationships between items. If the Express service uses a single database user or environment variables with broad read access, a prompt injection that triggers additional queries or batch reads can expose more items than intended. The scan’s checks for System prompt leakage and Output scanning for PII and API keys are particularly relevant: a poorly constructed prompt that includes a DynamoDB record as context may lead the model to regurgitate raw attribute values, and an output scan can detect API keys or session tokens stored in item attributes that then appear in model replies.
Excessive agency patterns compound the risk. If the Express backend uses the LLM to generate tool calls or function invocations—such as constructing a request to read from DynamoDB based on user instructions—an attacker may coerce the model into reading or iterating over more items than intended. Because the scan includes Excessive agency detection and tests for Unsafe Consumption of LLM outputs, it flags scenarios where LLM-generated calls to DynamoDB access patterns could lead to data exposure or enumeration.
To illustrate, consider an Express route that accepts a user ID, retrieves a profile from DynamoDB, and uses it in a prompt. Without proper input validation, authorization, and output checks, this flow can leak data through the LLM and expose sensitive attributes. The scan’s checks for Input Validation and Data Exposure highlight such flows, and findings will include severity and remediation guidance mapped to frameworks like OWASP API Top 10 and GDPR.
Dynamodb-Specific Remediation in Express — concrete code fixes
Remediation focuses on strict input validation, least-privilege data access, and preventing raw DynamoDB output from reaching the LLM. Below are concrete Express patterns with working DynamoDB SDK snippets that reduce the risk of data leakage.
1. Validate and sanitize all inputs before DynamoDB access
Ensure user-supplied identifiers conform to expected formats and do not contain unexpected traversal or injection patterns. Use parameterized field names and avoid concatenating user input into expression attribute names.
const { DynamoDBClient, GetItemCommand } = require('@aws-sdk/client-dynamodb');
const { marshall } = require('@aws-sdk/util-dynamodb');
const client = new DynamoDBClient({ region: 'us-east-1' });
function isValidUserId(userId) {
return typeof userId === 'string' && /^[a-zA-Z0-9_-]{1,64}$/.test(userId);
}
app.get('/profile/:userId', async (req, res) => {
const { userId } = req.params;
if (!isValidUserId(userId)) {
return res.status(400).json({ error: 'Invalid user ID' });
}
const command = new GetItemCommand({
TableName: process.env.PROFILE_TABLE,
Key: marshall({ pk: { S: `USER#${userId}` } })
});
try {
const { Item } = await client.send(command);
if (!Item) return res.status(404).json({ error: 'Not found' });
// Redact or exclude sensitive fields before LLM usage
const { email, ssn, internalNotes } = Item;
const safeItem = {
userId: userId,
name: Item.name?.S,
role: Item.role?.S
};
res.json(safeItem);
} catch (err) {
console.error(err);
res.status(500).json({ error: 'Internal error' });
}
});
2. Apply least-privilege IAM and use condition keys
Configure the runtime credentials used by the Express service to allow only GetItem/Query on specific table ARNs with conditions that restrict access by requester attributes where possible. Do not grant full table scan or delete permissions to the LLM-related execution paths.
3. Redact sensitive fields before LLM prompting
Strip or mask fields that should not appear in prompts or LLM responses. Avoid passing entire DynamoDB items as context.
function redactForLlm(item) {
// Remove keys that should not be visible to the LLM
const { ssn, apiKey, internalNotes, creditCard } = item;
return {
id: item.pk?.S,
name: item.name?.S,
role: item.role?.S,
// Explicitly exclude sensitive attributes
};
}
4. Separate LLM prompts from raw data retrieval
Do not directly inject retrieved DynamoDB records into system prompts. Instead, use curated summaries or explicitly approved fields, and log prompts for audit without including sensitive attributes.
const safeData = redactForLlm(Item);
const prompt = `Summarize the user's role and permissions for ${safeData.name} (${safeData.id}). Do not include internal notes or contact details.`;
// send prompt to LLM
5. Monitor and scope table encryption and key patterns
Ensure tables use server-side encryption and that the application does not log raw key attributes. The scan’s Encryption check will flag unencrypted tables; remediation includes enabling SSE with KMS-managed keys and reviewing key policies.
6. Use middleware to enforce rate limiting and schema validation
Apply consistent validation and rate limiting in Express to reduce abuse vectors that could lead to excessive queries or data scraping via the LLM.
const rateLimit = require('express-rate-limit');
const limiter = rateLimit({
windowMs: 60 * 1000,
max: 60,
standardHeaders: true,
legacyHeaders: false
});
app.use('/profile/', limiter);
By combining input validation, least-privilege access, field redaction, and disciplined prompt construction, Express applications that read from DynamoDB can reduce the likelihood of LLM data leakage and align findings from the scan’s Data Exposure, Encryption, and LLM/AI Security checks.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |