Llm Data Leakage in Actix with Firestore
Llm Data Leakage in Actix with Firestore — how this specific combination creates or exposes the vulnerability
When an Actix web service exposes an unauthenticated or improperly scoped endpoint that queries Cloud Firestore and returns data used as context for an LLM, there is a risk of LLM data leakage. This occurs when sensitive Firestore documents are included in prompts or streamed as model context without adequate authorization checks or output filtering. Because middleBrick scans the unauthenticated attack surface, it can detect endpoints that return Firestore data and subsequently expose that data through LLM responses.
Actix is a high-performance Rust web framework where handlers often deserialize Firestore documents into structured models and pass them directly into prompt templates. If authorization is missing or misconfigured, a handler might return an entire user profile, financial record, or internal configuration document. When that data becomes part of an LLM prompt or is included in streaming chat completions, it can be leaked to downstream model consumers. middleBrick’s LLM/AI Security checks specifically test for unauthenticated LLM endpoints and scan outputs for PII, API keys, and executable code, which helps surface these leakage paths in Firestore-integrated Actix services.
The combination increases the impact of common issues such as Insecure Direct Object References (IDOR) and missing property-level authorization. An attacker who can manipulate identifiers in Firestore queries might gain access to documents that should be restricted. If the Actix handler does not validate that the requesting subject has permission to view those documents, and then feeds the retrieved data into an LLM, the model may reproduce sensitive content in its responses. middleBrick’s BOLA/IDOR checks and Data Exposure checks are designed to highlight these authorization gaps, while the LLM/AI Security module looks for system prompt leakage and output PII to detect whether sensitive Firestore data is being reflected in model output.
Real-world patterns include an Actix handler that retrieves a Firestore document by ID and directly interpolates user fields into a prompt string, or streams Firestore-backed context to an LLM endpoint without redaction. Without input validation and strict authorization, this pipeline can unintentionally expose confidential information. middleBrick’s OpenAPI/Swagger analysis, with full $ref resolution, can correlate runtime behavior with spec definitions to highlight mismatches between documented and actual data exposure, especially when LLM-related routes are involved.
To illustrate, an Actix handler that queries a Firestore collection for user documents and passes the result to an LLM completion call may inadvertently disclose email addresses, phone numbers, or internal identifiers. If the handler lacks rate limiting or insufficient property-level authorization, the leakage risk grows. middleBrick tests for these conditions by examining authentication mechanisms, property authorization, and the handling of LLM-specific routes, ensuring that sensitive Firestore data is not improperly surfaced in model interactions.
Firestore-Specific Remediation in Actix — concrete code fixes
Secure Actix handlers that interact with Firestore by enforcing authorization before document retrieval, validating and sanitizing all inputs, and avoiding direct exposure of sensitive fields in LLM prompts. Use Firestore’s built-in security rules and server-side checks to ensure that a request can only access documents for which the subject has explicit permission. Apply strict field-level filtering so that only necessary, non-sensitive data is included in any context sent to the LLM.
Below are concrete Rust examples using the Actix web framework and the Firestore REST or official client libraries. These snippets demonstrate how to structure handlers to mitigate LLM data leakage.
1. Authorized Firestore fetch with field filtering
Ensure the authenticated subject matches the document owner and only select safe fields before sending data to the LLM.
use actix_web::{web, HttpResponse, Result};
use firestore::FirestoreDb;
use serde::{Deserialize, Serialize};
#[derive(Debug, Serialize, Deserialize)]
struct PublicProfile {
user_id: String,
display_name: String,
// Do not include email, phone, or internal identifiers
}
async fn get_public_profile(
db: web::Data,
path: web::Path, // document ID
subject_id: String, // from auth middleware
) -> Result {
let doc_id = path.into_inner();
// Enforce ownership or access control server-side
if !is_authorized_subject(&db, &doc_id, &subject_id).await? {
return Ok(HttpResponse::Forbidden().body("Access denied"));
}
// Fetch only public fields
let profile: PublicProfile = db
.get_doc(&doc_id)
.await
.map_err(|e| actix_web::error::ErrorInternalServerError(e))?;
Ok(HttpResponse::Ok().json(profile))
}
async fn is_authorized_subject(
db: &FirestoreDb,
doc_id: &str,
subject_id: &str,
) -> Result> {
// Example: read ACL document or embed owner_id in the profile
let doc: serde_json::Value = db.get_doc(doc_id).await?;
if let Some(owner) = doc.get("owner_id").and_then(|v| v.as_str()) {
Ok(owner == subject_id)
} else {
Ok(false)
}
}
2. Secure prompt construction without sensitive fields
Build prompts using only approved public data, and avoid string interpolation of raw Firestore fields.
fn build_prompt(profile: &PublicProfile, query: &str) -> String {
format!(
"User query: {}. Public profile: {{ display_name: {} }}",
query, profile.display_name
)
}
3. Firestore security rules to restrict access
Complement server-side checks with rules that enforce ownership and limit read scope.
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
match /profiles/{userId} {
allow read: if request.auth != null && request.auth.uid == userId;
allow write: if request.auth != null && request.auth.uid == userId;
}
}
}
4. Middleware for subject-based filtering
Use Actix guards or extractors to validate subjects before handlers run, reducing the chance of leaking data to the LLM layer.
use actix_web::dev::ServiceRequest;
use actix_web::Error;
use std::future::{ready, Ready};
struct AuthenticatedSubject(String);
impl FromRequest for AuthenticatedSubject {
type Error = Error;
type Future = Ready>;
type Config = ();
fn from_request(req: &actix_web::HttpRequest, _: &mut actix_web::dev::Payload) -> Self::Future {
// Extract subject_id from token or session
let subject_id = req.headers().get("X-Subject-ID")
.and_then(|v| v.to_str().ok())
.map(|s| s.to_string())
.unwrap_or_default();
ready(Ok(AuthenticatedSubject(subject_id)))
}
}
By combining server-side authorization, field-level filtering, and disciplined prompt engineering, Actix services can safely use Firestore data with LLMs while minimizing the risk of LLM data leakage. middleBrick’s scans can verify that these controls are present by checking authentication mechanisms, property authorization, and LLM output for sensitive content.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |