HIGH llm data leakageaxumdynamodb

Llm Data Leakage in Axum with Dynamodb

Llm Data Leakage in Axum with Dynamodb — how this specific combination creates or exposes the vulnerability

When an Axum service exposes an unauthenticated or weakly authenticated endpoint that interacts with DynamoDB and also provides an LLM-facing response, there is a risk of LLM data leakage. This occurs when application logic or error handling inadvertently includes sensitive DynamoDB record contents, table structure, or data patterns in responses returned to LLM clients or logged outputs that are visible to language models.

DynamoDB-specific factors that can contribute to leakage include verbose error messages that surface table names, key schema details, or conditional check failures; misconfigured CORS or routing that allows probing of table endpoints; and responses that include item attributes not intended for downstream consumption. If Axum handlers pass raw DynamoDB GetItem or Query responses directly into LLM prompts or expose them via streaming outputs, fields such as personal identifiers, internal keys, or operational metadata can be surfaced.

For example, an endpoint that retrieves a user profile by ID and then forwards that profile into an LLM prompt for "helpful summarization" can leak PII if the profile includes fields like email, phone, or internal IDs. Similarly, conditional writes that fail due to version checks can produce error payloads containing attribute names and validation rules, which may be captured in logs or returned in trace-like responses. These patterns intersect with the LLM/AI Security checks in middleBrick, which include active prompt injection testing, system prompt leakage detection, and output scanning for PII and API keys to detect whether an LLM endpoint reveals sensitive DynamoDB-derived content.

Real-world attack patterns that map to this risk include OWASP API Top 10 #5 (Broken Function Level Authorization) when over-permissive routes allow retrieval of DynamoDB items outside intended scope, and data exposure via verbose errors. In PCI-DSS and GDPR contexts, exposure of personal data in LLM responses is particularly critical because it can lead to unauthorized model memorization or exfiltration. middleBrick scanning can surface these issues by comparing runtime findings against OpenAPI/Swagger specs with full $ref resolution, checking whether responses inadvertently include DynamoDB attribute names or values that should be restricted.

Concrete examples in Axum often involve handlers that deserialize QueryOutput or GetItemOutput without filtering. If a handler uses the official AWS SDK for Rust and returns the full SDK output struct as JSON, fields like 'S' (string) or 'N' (number) type wrappers may expose internal naming conventions. Logging these outputs or including them in error pages can create a leakage channel that active prompt injection probes and output scanners can detect. By testing endpoints with sequential probes—system prompt extraction, instruction override, DAN jailbreak, data exfiltration, and cost exploitation—middleBrick can identify whether an Axum service leaks DynamoDB data through LLM interfaces.

Dynamodb-Specific Remediation in Axum — concrete code fixes

Remediation focuses on strict data scoping, output sanitization, and defensive error handling to prevent DynamoDB details from reaching LLM pathways. In Axum, design handlers to extract only required fields from DynamoDB responses, map them to minimal DTOs, and avoid returning raw SDK structs. Apply consistent filtering for PII and operational metadata before any data enters a prompt or log stream accessible to LLMs.

Use middleware for centralized error handling so that DynamoDB conditional check failures or validation errors do not expose table or attribute names. Ensure that CORS and route definitions follow least privilege, and avoid verbose debugging output in production. The following examples illustrate secure patterns for DynamoDB access in Axum.

Example 1: Safe GetItem with field selection and error sanitization

use aws_sdk_dynamodb::Client;
use axum::{routing::get, Router, extract::Path};
use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
struct UserProfile {
    user_id: String,
    display_name: String,
    // intentionally omit email, phone, or internal fields
}

async fn get_user_profile(
    Path(user_id): Path,
    client: &Client,
) -> Result<UserProfile, (axum::http::StatusCode, String)> {
    let resp = client
        .get_item()
        .table_name("Users")
        .set_key(Some({
            use aws_sdk_dynamodb::types::AttributeValue;
            let mut m = std::collections::HashMap::new();
            m.insert("user_id".to_string(), AttributeValue::S(user_id));
            m
        }))
        .send()
        .await
        .map_err(|_| (axum::http::StatusCode::INTERNAL_SERVER_ERROR, "Service error"))?;

    let item = resp.item.ok_or_else(|| (axum::http::StatusCode::NOT_FOUND, "User not found"))?;

    // Map only safe fields; avoid passing raw item to LLM or logs
    let profile = UserProfile {
        user_id: item.get("user_id").and_then(|v| v.as_s().ok()).cloned().unwrap_or_default(),
        display_name: item.get("display_name").and_then(|v| v.as_s().ok()).cloned().unwrap_or_default(),
    };
    Ok(profile)
}

pub fn app_routes(client: Client) -> Router {
    Router::new()
        .route("/users/:user_id", get(get_user_profile))
}

Example 2: Secure Query with consistent attribute projection and no debug echoes

async fn search_user_by_email(
    email: String,
    client: &Client,
) -> Result<UserProfile, (axum::http::StatusCode, String)> {
    let resp = client
        .query()
        .table_name("Users")
        .index_name("EmailIndex")
        .key_condition_expression("email = :email")
        .set_expression_attribute_values(Some({
            use aws_sdk_dynamodb::types::AttributeValue;
            let mut e = std::collections::HashMap::new();
            e.insert(":email".to_string(), AttributeValue::S(email));
            e
        }))
        .select(aws_sdk_dynamodb::model::Select::SPECIFIC_ATTRIBUTES)
        .projection_expression("user_id, display_name")
        .send()
        .await
        .map_err(|_| (axum::http::StatusCode::INTERNAL_SERVER_ERROR, "Search failed"))?;

    let items = resp.items.ok_or_else(|| (axum::http::StatusCode::NOT_FOUND, "No matches"))?;
    if items.is_empty() {
        return Err((axum::http::StatusCode::NOT_FOUND, "No matches"));
    }
    // Use the first matching item; enforce strict field selection
    let item = &items[0];
    Ok(UserProfile {
        user_id: item.get("user_id").and_then(|v| v.as_s().ok()).cloned().unwrap_or_default(),
        display_name: item.get("display_name").and_then(|v| v.as_s().ok()).cloned().unwrap_or_default(),
    })
}

In both examples, avoid returning or logging the full SDK types (e.g., GetItemOutput or QueryOutput). Instead, map to lean structs and ensure error messages are generic. For LLM-facing endpoints, integrate middleBrick CLI or Dashboard to scan for data exposure and prompt injection risks; the Pro plan adds continuous monitoring and GitHub Action PR gates to catch regressions. These practices reduce the chance that DynamoDB-derived content leaks into LLM contexts, aligning with findings reported by middleBrick’s runtime checks.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How does middleBrick detect LLM data leakage involving DynamoDB in Axum services?
middleBrick runs active prompt injection probes and output scanning against LLM endpoints, checking for PII, API keys, and executable code. It compares runtime behavior against an OpenAPI/Swagger spec with full $ref resolution to identify responses that inadvertently include DynamoDB attribute names or values. Findings are mapped to OWASP API Top 10 and compliance frameworks to highlight data exposure risks.
Can the free plan of middleBrick detect DynamoDB-related LLM leakage in Axum?
The free plan allows 3 scans per month and includes the same 12 security checks, including LLM/AI Security tests for prompt injection, system prompt leakage, and output scanning for PII. It is suitable for initial assessment; for continuous monitoring and CI/CD integration to prevent regressions, consider the Starter or Pro plans.