Llm Data Leakage in Aspnet with Firestore
Llm Data Leakage in Aspnet with Firestore — how this specific combination creates or exposes the vulnerability
When an ASP.NET application integrates with Google Cloud Firestore and exposes endpoints that return Firestore documents or query results to LLM-facing interfaces, data leakage can occur if sensitive fields are unintentionally surfaced. Firestore documents often contain internal metadata (e.g., timestamps, user IDs, roles, or PII) alongside business data. If the API response is consumed by an LLM — either as part of a retrieval-augmented generation pipeline or via an unauthenticated endpoint — these fields may be exposed through model outputs, logs, or error messages.
In an ASP.NET context, this risk is amplified when controllers or minimal APIs serialize entire Firestore documents (including hidden fields such as __name__, create_time, or custom administrative flags) without filtering. For example, a GET endpoint that returns a user profile might inadvertently include a Firestore field like internal_notes or role, which an LLM could then echo in responses to downstream clients. Attackers may use prompt injection techniques to coax the LLM into repeating these fields or to infer data patterns through error handling or verbose output.
The LLM/AI Security checks in middleBrick specifically target this scenario by scanning for system prompt leakage and testing whether unauthenticated LLM endpoints can extract sensitive information from data that originated in Firestore. If an endpoint passes Firestore data directly into a prompt or LLM response without redaction, active prompt injection probes can validate whether the model reveals confidential document fields. Output scanning further detects whether API keys, PII, or executable code appear in LLM responses that were influenced by Firestore content.
Because Firestore documents are often structured as nested maps or contain dynamic fields, ASP.NET developers might inadvertently treat all keys as safe for LLM consumption. Without explicit field filtering or schema validation, the serialized document may expose internal implementation details that were never intended for model interaction. middleBrick’s OpenAPI/Swagger analysis helps identify whether Firestore document structures are mapped clearly in API specs and whether runtime findings align with expected data flows, highlighting mismatches where sensitive fields leak into LLM-accessible surfaces.
Using the middleBrick CLI to scan such an endpoint can reveal these risks quickly. For instance, running middlebrick scan https://api.example.com/users/{id} may flag findings under Data Exposure and LLM/AI Security, showing that Firestore fields like auth_role or deleted_at appear in responses that reach LLM endpoints. This enables developers to address the issue before deployment, especially when integrating with the GitHub Action to enforce score thresholds in CI/CD pipelines.
Firestore-Specific Remediation in Aspnet — concrete code fixes
To prevent LLM data leakage when using Firestore in ASP.NET, you must control which document fields are serialized and exposed to LLM-facing endpoints. The following approaches focus on explicit field selection, DTOs (Data Transfer Objects), and secure serialization practices.
1. Use DTOs to project only safe fields when reading from Firestore:
using Google.Cloud.Firestore;
using System.ComponentModel.DataAnnotations;
public class UserProfileDto
{
public string UserId { get; set; }
public string DisplayName { get; set; }
public string Email { get; set; }
// Do not include fields like InternalRole, FirestoreDocumentName, or CreatedAt
}
public async Task<UserProfileDto> GetUserProfileAsync(string userId)
{
DocumentReference docRef = _firestoreDb.Collection("users").Document(userId);
DocumentSnapshot snapshot = await docRef.GetSnapshotAsync();
if (snapshot.Exists)
{
return new UserProfileDto
{
UserId = snapshot.Id,
DisplayName = snapshot.GetValue<string>("displayName") ?? string.Empty,
Email = snapshot.GetValue<string>("email") ?? string.Empty
};
}
return null;
}2. In minimal APIs, avoid passing the entire DocumentSnapshot to JSON serialization:
app.MapGet("/users/{id}", async (string id, FirestoreDb db) =>
{
DocumentReference docRef = db.Collection("users").Document(id);
DocumentSnapshot snapshot = await docRef.GetSnapshotAsync();
if (!snapshot.Exists) return Results.NotFound();
// Explicitly build an anonymous object with only safe fields
var safeData = new
{
user_id = snapshot.Id,
name = snapshot.GetValue<string>("fullName"),
email = snapshot.GetValue<string>("emailAddress")
// Exclude Firestore metadata and sensitive fields
};
return Results.Ok(safeData);
});3. If you must include dynamic fields, define an allowlist and filter aggressively:
var allowedFields = new HashSet<string> { "displayName", "email", "publicBio" };
var safeDict = snapshot.ToDictionary()
.Where(kvp => allowedFields.Contains(kvp.Key))
.ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
// safeDict now contains only permitted fields4. Ensure that error handling does not leak Firestore internals:
try
{
// Firestore operations
}
catch (Exception ex)
{
// Avoid logging or exposing snapshot paths or internal codes
_logger.LogWarning("Failed to load user data");
throw new ApplicationException("Unable to load profile");
}5. For LLM integration endpoints, sanitize outputs before sending to models or clients:
public string SanitizeForLlm(string input)
{
// Remove or mask sensitive substrings if necessary
return input.Replace("internal_", "[REDACTED]", StringComparison.OrdinalIgnoreCase);
}These patterns ensure that only intended data reaches LLM endpoints, reducing the risk of data leakage. When using the middleBrick Pro plan, you can enable continuous monitoring to detect regressions where new Firestore fields might unintentionally appear in API responses, and the GitHub Action can fail builds if scans exceed defined security thresholds.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |