Llm Data Leakage in Actix with Mongodb
Llm Data Leakage in Actix with Mongodb — how this specific combination creates or exposes the vulnerability
When an Actix web service exposes an unauthenticated or improperly scoped endpoint that returns data from a MongoDB collection used to power an LLM integration, there is a risk of LLM data leakage. This occurs when responses from the service include sensitive or training data that can be extracted by an LLM-focused attack, such as prompt injection or output scanning. The combination of Actix handling requests, MongoDB as the document store, and an LLM endpoint or agentic workflow increases the attack surface because data intended for internal model consumption may be exposed through error messages, verbose responses, or unchecked reflection of database content.
In this stack, leakage can happen in several concrete ways. If the Actix handler builds MongoDB queries by interpolating user input into JSON filters without strict validation, an attacker may use injection techniques to coerce the service into returning documents it should not. Those returned documents might contain fields like internal IDs, emails, or model metadata that are sensitive. Even when the Actix service applies basic authorization, misconfigured scopes or missing field-level checks can allow a broader read than intended. If the response is passed to an LLM — for example, to generate summaries or to answer questions over the data — the LLM’s output may inadvertently reflect the underlying data in its completions. Output scanning for LLM data leakage therefore must inspect not only the LLM response but also the inputs that were constructed from MongoDB results.
Another vector is through introspection or debugging endpoints. If the Actix application exposes routes that echo query structure or database schema, an attacker can probe these routes to learn collection names or field names and then ask the LLM to reason over or exfiltrate that data. The LLM/AI security checks in middleBrick specifically look for such exposure by testing for system prompt leakage and output scanning for PII, API keys, or executable code. In this context, if the LLM receives data that includes sensitive information from MongoDB and does not sanitize it, the model might regurgitate or infer that data in its replies. This is especially relevant when using embeddings or vector search where document text is supplied directly to the model. Therefore, securing the Actix-to-MongoDB path and ensuring that only necessary, sanitized fields are surfaced to the LLM is critical to preventing LLM data leakage.
Mongodb-Specific Remediation in Actix — concrete code fixes
To mitigate LLM data leakage when using MongoDB with Actix, apply strict query scoping, projection, and input validation so that only intended fields are retrieved and exposed. Use MongoDB’s projection to exclude sensitive fields from query results, and avoid returning full documents to the LLM unless absolutely necessary. Below are concrete, working examples that demonstrate secure patterns in Actix with the official MongoDB Rust driver.
First, define a strongly typed structure that includes only the fields the LLM needs. This ensures that even if a query is inadvertently broad, the data returned to the model is limited:
use mongodb::{bson::{doc, Document}, options::FindOptions};
use serde::{Deserialize, Serialize};
#[derive(Debug, Deserialize, Serialize)]
struct PublicArticle {
title: String,
summary: String,
// Intentionally excluding internal fields like `author_id`, `raw_content`, `metadata`
}
// Convert a MongoDB Document into the public shape
fn to_public_article(doc: Document) -> Result<PublicArticle, mongodb::error::Error> {
let title: String = doc.get_str("title")?.to_string();
let summary: String = doc.get_str("summary")?.to_string();
Ok(PublicArticle { title, summary })
}
Second, construct queries with explicit projection and validation of incoming filters to prevent unwanted data exposure:
use mongodb::Client;
use actix_web::{web, HttpResponse};
async fn get_article_public(
client: web::Data<Client>,
path: web::Path<(String,)>, // e.g., article_id
) -> actix_web::Result<HttpResponse> {
let collection = client.database("mydb").collection("articles");
let article_id = path.0.clone();
// Validate ID format before using it in a query
if !article_id.chars().all(|c| c.is_alphanumeric() || c == '-' || c == '_') {
return Ok(HttpResponse::BadRequest().body("Invalid identifier"));
}
let filter = doc! { "_id": article_id };
let projection = doc! { "title": 1, "summary": 1, "_id": 1 };
let find_options = FindOptions::builder().projection(projection).build();
let result = collection.find_one(filter, find_options).await?;
match result {
Some(doc) => {
let public = to_public_article(doc)?;
Ok(HttpResponse::Ok().json(public))
}
None => Ok(HttpResponse::NotFound().finish()),
}
}
Third, avoid dynamic field selection driven by user input. If your API allows clients to specify which fields to return, validate those fields against an allowlist so that sensitive fields such as passwords, tokens, or internal references cannot be requested:
fn build_projection(requested_fields: Option<Vec<String>>) -> Document {
const ALLOWED: &[&str] = &["title", "summary", "published_at"];
let mut projection = Document::new();
if let Some(fields) = requested_fields {
for f in fields {
if ALLOWED.contains(&f.as_str()) {
projection.insert(f, 1);
}
}
}
// Ensure essential fields for routing are present
projection.insert("_id", 1);
projection
}
By combining these patterns — strict projection, allowlist-based field selection, and strong input validation — you reduce the likelihood that MongoDB responses will include data that could be leaked through LLM outputs. This aligns with the LLM/AI security checks provided by middleBrick, which scan for PII, API keys, and other sensitive content in model responses and help identify when underlying data sources may be overexposed.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |