HIGH llm data leakageaxummongodb

Llm Data Leakage in Axum with Mongodb

Llm Data Leakage in Axum with Mongodb — how this specific combination creates or exposes the vulnerability

When an Axum service uses MongoDB as a backend and exposes endpoints that return or process large text fields, there is a risk of unintended data exposure to downstream Large Language Model (LLM) integrations. This typically occurs when application responses include sensitive document content stored in MongoDB and are forwarded to LLM clients or logging systems without redaction or access checks.

In a typical Axum handler, a developer might retrieve a MongoDB document and serialize it into a JSON response. If that document contains fields such as notes, description, or internal_comment, and the handler does not explicitly filter them, an integration that sends the response to an LLM service could inadvertently leak confidential information. For example, embedding full user records or internal commentary into prompts or tool outputs increases the chance that sensitive data appears in LLM logs, generated text, or error traces.

LLM data leakage in this context is not about breaking authentication to reach MongoDB; it is about the flow of data from a trusted data store (MongoDB) through an application layer (Axum) to an LLM-facing channel. The risk is amplified when Axum endpoints are instrumented for debugging or when response bodies are automatically captured by observability tooling that forwards data to LLM analysis services. Real-world patterns seen in LLM security testing include system prompt leakage, where verbose error messages or context windows expose internal instructions, and output scanning that detects API keys or PII embedded in model replies.

Because middleBrick scans unauthenticated attack surfaces and includes LLM/AI Security checks, it can detect scenarios where responses from an Axum + MongoDB service contain patterns indicative of leakage, such as embedded credentials, long text fields resembling internal notes, or structured data that should not be exposed. The scanner does not assume a vulnerable configuration but flags the presence of high-risk data paths that merit review.

Concrete examples include an endpoint that returns a full MongoDB document with fields like internal_notes and password_hint, and a downstream service that pipes the response into an LLM tool call. Another scenario involves logging middleware that captures the entire response body for troubleshooting and forwards snippets to an LLM-based log analysis tool, creating a secondary leakage path.

Mitigating this requires deliberate data handling in Axum: limit the fields sent to LLM consumers, redact sensitive values at the serialization layer, and ensure that any integration with LLM services operates only on curated subsets of data. Because Axum does not enforce schema-level field filtering by default, developers must explicitly design response shapes for LLM consumption and validate that MongoDB queries do not return more information than necessary.

Mongodb-Specific Remediation in Axum — concrete code fixes

To reduce LLM data leakage risk, structure Axum handlers to return only the fields required for the immediate operation and avoid forwarding raw MongoDB documents to LLM endpoints. Use projection in MongoDB queries and explicit struct serialization to control the output surface.

Below are concrete, working examples using the MongoDB Rust driver with Axum. These snippets assume a MongoDB collection named users and a document schema that may contain sensitive fields such as password_hash, internal_notes, and email.

1. Define minimal response structs

Create dedicated structs for responses that may be consumed by LLM integrations. This ensures only intended fields are serialized.

use serde::{Deserialize, Serialize};

#[derive(Debug, Serialize, Deserialize)]
pub struct UserPublic {
    pub id: String,
    pub username: String,
    pub display_name: String,
}

#[derive(Debug, Serialize, Deserialize)]
pub struct UserInternal {
    pub id: String,
    pub username: String,
    pub email: String,
    pub internal_notes: String,
}

2. Use projection in MongoDB queries

When fetching documents for public endpoints, select only the required fields. This prevents sensitive fields from being loaded into memory or accidentally serialized.

use mongodb::{bson::{doc, Document}, Collection};

async fn get_user_public(collection: &Collection, user_id: &str) -> Option {
    let filter = doc! { "_id": user_id };
    let projection = doc! { "username": 1, "display_name": 1, "_id": 1 };
    let opt_doc = collection.find_one(filter, Some(projection)).await.ok()?;
    bson::from_document(opt_doc).ok()
}

3. Separate internal handlers with restricted fields

For internal or administrative endpoints, use a different projection and a different struct that still omits secrets.

async fn get_user_internal(collection: &Collection, user_id: &str) -> Option {
    let filter = doc! { "_id": user_id };
    let projection = doc! { "username": 1, "email": 1, "internal_notes": 1, "_id": 1 };
    let opt_doc = collection.find_one(filter, Some(projection)).await.ok()?;
    bson::from_document(opt_doc).ok()
}

4. Avoid forwarding raw documents to LLM utilities

Do not pass entire MongoDB documents into functions that invoke LLM clients. Instead, pass only the fields that are necessary and explicitly redact or omit sensitive content.

async fn ask_llm_about_user(public_user: &UserPublic) -> String {
    // Build a prompt using only safe, public fields
    let prompt = format!("Provide a greeting for user {}", public_user.username);
    // llm_client.complete(&prompt) would be called here
    prompt
}

5. Validate and sanitize before logging

If logging is required, ensure logs do not contain sensitive fields. Use sanitization functions or structured logging that excludes sensitive keys.

fn safe_log_user(user: &UserPublic) {
    // Only log non-sensitive fields
    tracing::info!(user_id = %user.id, username = %user.username, "user accessed");
}

By combining projection queries, strict response structs, and disciplined handling before LLM interactions, an Axum service can minimize the data available for potential leakage while continuing to use MongoDB as a backend.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

Can LLM data leakage occur even if MongoDB authentication is properly configured?
Yes. Proper authentication prevents unauthorized access to MongoDB, but LLM data leakage concerns how data is handled after authentication within Axum. If handlers expose sensitive fields in responses that are sent to LLM endpoints or logs, leakage can occur regardless of database authentication settings.
Does using middleware that masks fields at runtime fully prevent LLM data leakage in Axum with MongoDB?
Middleware can help reduce risk if it removes or redacts sensitive fields before responses leave the application. However, the safest approach is to avoid including sensitive fields in the response structures used by LLM integrations and to limit MongoDB query projections to only the fields required for each use case.