HIGH llm data leakageactixmongodb

Llm Data Leakage in Actix with Mongodb

Llm Data Leakage in Actix with Mongodb — how this specific combination creates or exposes the vulnerability

When an Actix web service exposes an unauthenticated or improperly scoped endpoint that returns data from a MongoDB collection used to power an LLM integration, there is a risk of LLM data leakage. This occurs when responses from the service include sensitive or training data that can be extracted by an LLM-focused attack, such as prompt injection or output scanning. The combination of Actix handling requests, MongoDB as the document store, and an LLM endpoint or agentic workflow increases the attack surface because data intended for internal model consumption may be exposed through error messages, verbose responses, or unchecked reflection of database content.

In this stack, leakage can happen in several concrete ways. If the Actix handler builds MongoDB queries by interpolating user input into JSON filters without strict validation, an attacker may use injection techniques to coerce the service into returning documents it should not. Those returned documents might contain fields like internal IDs, emails, or model metadata that are sensitive. Even when the Actix service applies basic authorization, misconfigured scopes or missing field-level checks can allow a broader read than intended. If the response is passed to an LLM — for example, to generate summaries or to answer questions over the data — the LLM’s output may inadvertently reflect the underlying data in its completions. Output scanning for LLM data leakage therefore must inspect not only the LLM response but also the inputs that were constructed from MongoDB results.

Another vector is through introspection or debugging endpoints. If the Actix application exposes routes that echo query structure or database schema, an attacker can probe these routes to learn collection names or field names and then ask the LLM to reason over or exfiltrate that data. The LLM/AI security checks in middleBrick specifically look for such exposure by testing for system prompt leakage and output scanning for PII, API keys, or executable code. In this context, if the LLM receives data that includes sensitive information from MongoDB and does not sanitize it, the model might regurgitate or infer that data in its replies. This is especially relevant when using embeddings or vector search where document text is supplied directly to the model. Therefore, securing the Actix-to-MongoDB path and ensuring that only necessary, sanitized fields are surfaced to the LLM is critical to preventing LLM data leakage.

Mongodb-Specific Remediation in Actix — concrete code fixes

To mitigate LLM data leakage when using MongoDB with Actix, apply strict query scoping, projection, and input validation so that only intended fields are retrieved and exposed. Use MongoDB’s projection to exclude sensitive fields from query results, and avoid returning full documents to the LLM unless absolutely necessary. Below are concrete, working examples that demonstrate secure patterns in Actix with the official MongoDB Rust driver.

First, define a strongly typed structure that includes only the fields the LLM needs. This ensures that even if a query is inadvertently broad, the data returned to the model is limited:

use mongodb::{bson::{doc, Document}, options::FindOptions};
use serde::{Deserialize, Serialize};

#[derive(Debug, Deserialize, Serialize)]
struct PublicArticle {
    title: String,
    summary: String,
    // Intentionally excluding internal fields like `author_id`, `raw_content`, `metadata`
}

// Convert a MongoDB Document into the public shape
fn to_public_article(doc: Document) -> Result<PublicArticle, mongodb::error::Error> {
    let title: String = doc.get_str("title")?.to_string();
    let summary: String = doc.get_str("summary")?.to_string();
    Ok(PublicArticle { title, summary })
}

Second, construct queries with explicit projection and validation of incoming filters to prevent unwanted data exposure:

use mongodb::Client;
use actix_web::{web, HttpResponse};

async fn get_article_public(
    client: web::Data<Client>,
    path: web::Path<(String,)>, // e.g., article_id
) -> actix_web::Result<HttpResponse> {
    let collection = client.database("mydb").collection("articles");
    let article_id = path.0.clone();

    // Validate ID format before using it in a query
    if !article_id.chars().all(|c| c.is_alphanumeric() || c == '-' || c == '_') {
        return Ok(HttpResponse::BadRequest().body("Invalid identifier"));
    }

    let filter = doc! { "_id": article_id };
    let projection = doc! { "title": 1, "summary": 1, "_id": 1 };
    let find_options = FindOptions::builder().projection(projection).build();

    let result = collection.find_one(filter, find_options).await?;
    match result {
        Some(doc) => {
            let public = to_public_article(doc)?;
            Ok(HttpResponse::Ok().json(public))
        }
        None => Ok(HttpResponse::NotFound().finish()),
    }
}

Third, avoid dynamic field selection driven by user input. If your API allows clients to specify which fields to return, validate those fields against an allowlist so that sensitive fields such as passwords, tokens, or internal references cannot be requested:

fn build_projection(requested_fields: Option<Vec<String>>) -> Document {
    const ALLOWED: &[&str] = &["title", "summary", "published_at"];
    let mut projection = Document::new();
    if let Some(fields) = requested_fields {
        for f in fields {
            if ALLOWED.contains(&f.as_str()) {
                projection.insert(f, 1);
            }
        }
    }
    // Ensure essential fields for routing are present
    projection.insert("_id", 1);
    projection
}

By combining these patterns — strict projection, allowlist-based field selection, and strong input validation — you reduce the likelihood that MongoDB responses will include data that could be leaked through LLM outputs. This aligns with the LLM/AI security checks provided by middleBrick, which scan for PII, API keys, and other sensitive content in model responses and help identify when underlying data sources may be overexposed.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How does middleBrick detect LLM data leakage from an Actix + MongoDB stack?
middleBrick runs LLM/AI security checks that include system prompt leakage detection, active prompt injection probes, and output scanning for PII, API keys, and executable code. If your Actix service exposes MongoDB data to an LLM endpoint or includes sensitive content in responses, these checks can surface leakage by analyzing the outputs and the inputs that lead to them.
Can middleBrick fix LLM data leakage in MongoDB responses from Actix?
middleBrick detects and reports findings with remediation guidance, but it does not fix, patch, or block data. You should apply strict projection, input validation, and allowlist-based field selection in your Actix handlers to ensure MongoDB responses do not expose sensitive data to LLMs.