HIGH llm data leakageaxumfirestore

Llm Data Leakage in Axum with Firestore

Llm Data Leakage in Axum with Firestore — how this specific combination creates or exposes the vulnerability

When building an Axum web service that uses Google Firestore as a backend and exposes an endpoint to Large Language Models (LLMs), data leakage can occur at the intersection of model outputs and Firestore-stored data. Axum routes pass request contexts through layers, and if those contexts or responses include Firestore documents that contain sensitive fields, an LLM endpoint that streams or echoes model-generated content may inadvertently surface PII, API keys, or internal identifiers present in the Firestore records.

Consider a scenario where Axum handlers retrieve a Firestore document containing user profile data, assistant configurations, or prompt templates, and then pass the document contents into an LLM inference call. If the LLM response is not inspected before being returned to the client, structured data from Firestore (such as map fields, nested arrays, or metadata IDs) can appear in the output. This is especially relevant when the LLM is used to generate natural-language summaries or to orchestrate multi-step workflows that reference Firestore entities by ID.

The LLM/AI Security checks in middleBrick specifically target this class of risk by scanning for System prompt leakage patterns that match ChatML, Llama 2, Mistral, and Alpaca formats, which often include placeholders or examples that may resemble Firestore document structures. During active prompt injection testing, probes such as system prompt extraction and data exfiltration attempt to coax the model into repeating or transforming sensitive Firestore fields. Output scanning then looks for PII, API keys, and executable code in the LLM responses, which may include data originally read from Firestore. Unauthenticated LLM endpoint detection further ensures that endpoints which do not enforce authorization are flagged, reducing the chance that an attacker can directly probe an Axum route to harvest Firestore-derived content.

Additionally, excessive agency detection identifies patterns such as tool_calls, function_call, or LangChain agent configurations that may allow an LLM to indirectly reference or mutate Firestore documents through generated function arguments. Because Firestore documents often contain dynamic fields like timestamps, version numbers, or user identifiers, these values can be reflected in model outputs if input validation and authorization checks are not applied consistently in the Axum handler layer.

In a compliance context, findings related to LLM data leakage from Firestore align with OWASP API Top 10 controls around Security Logging and Monitoring, as well as data exposure risks under SOC2 and GDPR. middleBrick cross-references OpenAPI/Swagger specifications with runtime behavior, so if your spec defines an endpoint that returns Firestore-derived JSON schemas, any deviation or unexpected data inclusion will be surfaced with severity and remediation guidance.

Firestore-Specific Remediation in Axum — concrete code fixes

To mitigate LLM data leakage when using Firestore in Axum, you should sanitize Firestore documents before they reach the LLM layer and enforce strict output inspection. Below are concrete Axum handler patterns that demonstrate secure retrieval, transformation, and usage of Firestore data.

First, define a Firestore document model that excludes sensitive fields from exposure. Use selective serialization rather than passing raw Firestore maps directly into the LLM context.

use google_cloud_rust::firestore::client::Client;
use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
struct PublicProfile {
    user_id: String,
    display_name: String,
    avatar_url: Option,
}

// Sensitive fields are omitted from serialization
impl From for PublicProfile {
    fn from(doc: DocumentSnapshot) -> Self {
        let map: HashMap = doc.data;
        Self {
            user_id: map.get("user_id").and_then(|v| v.as_str()).unwrap_or("").to_string(),
            display_name: map.get("display_name").and_then(|v| v.as_str()).unwrap_or("").to_string(),
            avatar_url: map.get("avatar_url").and_then(|v| v.as_str()).map(String::from),
        }
    }
}

Second, ensure that Firestore reads occur within authorized Axum layers and that the handler does not forward raw documents to the LLM client. Use extractor guards and validation to limit what data is passed forward.

use axum::{routing::get, Router, extract::State};
use std::sync::Arc;

struct AppState {
    firestore_client: Client,
}

async fn get_public_profile(
    State(state): State>,
    user_id: String,
) -> Result, (StatusCode, String)> {
    let doc_path = format!("profiles/{}", user_id);
    let snapshot = state.firestore_client.get_document(&doc_path).await.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
    if snapshot.exists {
        Ok(Json(PublicProfile::from(snapshot)))
    } else {
        Err((StatusCode::NOT_FOUND, "Profile not found".into()))
    }
}

fn app() -> Router> {
    Router::new()
        .route("/profile/:user_id", get(get_public_profile))
        .with_state(Arc::new(AppState { firestore_client }))
}

Third, if you must include Firestore data in LLM prompts, explicitly filter fields and avoid echoing map keys or internal IDs. Use a prompt builder that whitelists allowed fields and redacts anything that matches patterns resembling API keys or internal references.

fn build_prompt(profile: &PublicProfile, additional_context: &str) -> String {
    format!(
        "You are assisting user {} (public display name). Context: {}",
        profile.display_name, additional_context
    )
}

Finally, integrate middleBrick’s CLI to scan your Axum endpoints and verify that no Firestore-specific leakage patterns appear in the generated report. The Pro plan’s continuous monitoring can be configured to alert your team if new endpoints expose raw Firestore structures, helping you maintain compliance with OWASP API Top 10 and data protection regulations.

Related CWEs: llmSecurity

CWE ID	Name	Severity
CWE-754	Improper Check for Unusual or Exceptional Conditions	MEDIUM

Frequently Asked Questions

How can I confirm that Firestore documents are not leaking into LLM responses?

Use middleBrick’s output scanning to inspect LLM responses for PII and API keys. Combine this with unit tests that assert Firestore document fields are omitted from prompt and response payloads.

Does middleBrick’s LLM/AI Security check cover Firestore-specific leakage patterns?

Yes. middleBrick’s system prompt leakage detection includes regex patterns for common LLM formats, and its active prompt injection tests attempt data exfiltration that may reveal Firestore-derived data in model outputs.

Llm Data Leakage in Axum with Firestore