HIGH llm data leakagechifirestore

Llm Data Leakage in Chi with Firestore

Llm Data Leakage in Chi with Firestore — how this specific combination creates or exposes the vulnerability

When an application serving a Chi-based web interface interacts with Google Cloud Firestore and also exposes an unauthenticated or improperly constrained Large Language Model (LLM) endpoint, sensitive data can be unintentionally exposed. Firestore stores structured documents and collections, often including user data, session tokens, or business-critical records. If the LLM endpoint can read or log request and response content, and if Firestore rules or application logic do not enforce strict ownership and least-privilege access, the LLM may return data that should remain private.

LLM data leakage in this context can occur through several realistic paths. For example, an endpoint that queries Firestore based on user-supplied parameters might inadvertently pass sensitive Firestore document contents (such as personally identifiable information or internal identifiers) into prompts sent to the LLM. If the LLM response is not inspected for PII or secrets, users might receive raw data that was never intended for model consumption. This is especially relevant when using middleBrick’s LLM security checks, which include system prompt leakage detection across 27 regex patterns, active prompt injection testing (five sequential probes), and output scanning for PII, API keys, and executable code.

In a Chi application, routes might directly reference Firestore collections without proper authorization checks. Consider a route that fetches a user profile document using an ID from the URL and then sends that profile to an LLM for summarization or analysis. If the ID is user-controlled and the server does not validate that the authenticated user owns the document, an attacker can manipulate the ID to access another user’s data. Even if authentication is enforced, overly permissive Firestore rules may allow broader reads than intended, enabling an LLM-facing service account to pull documents it should not see.

Firestore’s real-time listeners and batched operations can compound the issue. If a Chi handler attaches a listener to a broad collection and forwards events to an LLM endpoint, sensitive updates may be streamed unintentionally. The combination of Firestore’s flexible data model and the LLM’s ability to generate natural-language responses increases the chance that sensitive fields are echoed back in logs or model outputs. middleBrick’s excessive agency detection helps identify patterns such as tool_calls or function_call usage that may automate data exposure, while its output scanning detects API keys and PII in LLM responses.

Another vector involves prompt injection and jailbreak attempts aimed at tricking the LLM into revealing Firestore-backed data. An attacker may supply crafted input designed to bypass application-level guards and cause the LLM to regurgitate stored documents or schema details. middleBrick’s active prompt injection testing probes system prompt extraction, instruction override, DAN jailbreak, data exfiltration, and cost exploitation, which can uncover weaknesses in how prompts are constructed and how Firestore data is incorporated.

To summarize, the risk arises when Firestore data is ingested by LLM-facing endpoints in Chi without rigorous access controls, input validation, and output inspection. The LLM becomes a channel through which sensitive data can be extracted or inferred. Leveraging middleBrick’s LLM security capabilities, including system prompt leakage detection, output scanning for PII, and unauthorized LLM endpoint detection, is essential to identify and mitigate these specific leakage paths.

Firestore-Specific Remediation in Chi — concrete code fixes

Remediation focuses on enforcing ownership checks, tightening Firestore security rules, and ensuring LLM endpoints never receive or return sensitive data unintentionally. Below are concrete examples tailored to Chi routes and Firestore interactions.

1. Enforce user ownership in Chi routes

Validate that the requesting user can only access their own Firestore documents. Use Firebase Admin SDK on the server to verify ID tokens and scope queries by user ID.

// server-side Chi handler in Clojure/ClojureScript
(defn get-user-profile [request]
  (let [user-id (get-in request [:session :uid])
        db (firebase/firestore)
        doc-ref (.doc (db/collection db "profiles") user-id)
        snapshot (.get doc-ref)]
    (if (.exists snapshot)
      {:status 200 :body (.getData snapshot)}
      {:status 404 :body {:error "not found"}}))

2. Use Firestore security rules to restrict reads

Ensure rules allow reads only when request.auth.uid matches the document ID. Avoid wildcards that permit broad access.

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    match /profiles/{userId} {
      allow read, write: if request.auth != null && request.auth.uid == userId;
      allow read: if false; // deny public or unauthenticated access
    }
  }
}

3. Sanitize data before LLM consumption

Strip or hash sensitive fields before sending data to the LLM. Never forward raw Firestore documents that contain PII or secrets.

;; Example: remove sensitive keys in Chi before prompting
(defn safe-for-llm [doc]
  (select-keys doc [:publicName :preferences]))

(defn build-prompt [user-doc]
  (let [safe-doc (safe-for-llm user-doc)]
    (str "Summarize the following public preferences: " (pr-str safe-doc))))

4. Validate and constrain LLM inputs

Treat all user input as untrusted. Validate IDs and avoid concatenating raw input into prompts that may reach Firestore.

;; Validate userId format before using it in Firestore lookup
(defn valid-user-id? [id]
  (and (string? id) (re-matches #"^[a-zA-Z0-9_-]{1,36}$" id)))

(defn handler [request]
  (let [user-id (get-in request [:params :id])]
    (if (valid-user-id? user-id)
      (let [db (firebase/firestore)
            doc-ref (.doc (db/collection "profiles") user-id)
            snapshot (.get doc-ref)]
        (if (.exists snapshot)
          {:status 200 :body (.getData snapshot)}
          {:status 404}))
      {:status 400 :body {:error "invalid user id"}}))

5. Audit LLM outputs for accidental data leakage

Use middleBrick’s output scanning to detect PII, API keys, and code in LLM responses. Integrate these checks into your pipeline to catch regressions early.

;; Pseudo-code: run middleBrick output scan on LLM response
;; In practice, call the middleBrick API or CLI to analyze response content

6. Limit LLM access to non-sensitive metadata

Design Firestore documents so that only non-sensitive metadata is readable by services that interact with LLMs. Keep personally identifiable or regulated data in separate, tightly restricted collections.

// Firestore document structure suggestion
// profiles/{userId}
//   public: { displayName: string, bio: string }
//   private: { email: string, ssnHash: string }  // restricted to owner/admin

Related CWEs: llmSecurity

CWE ID	Name	Severity
CWE-754	Improper Check for Unusual or Exceptional Conditions	MEDIUM

Frequently Asked Questions

How does middleBrick detect LLM data leakage involving Firestore data?

middleBrick scans LLM outputs for PII, API keys, and executable code, and uses system prompt leakage detection patterns to identify when Firestore-backed data is exposed in model responses.

Can Firestore security rules alone prevent LLM data leakage in Chi?

Firestore rules are essential for access control, but they do not protect against logic flaws where authorized data is forwarded to an LLM. You must also sanitize inputs and audit LLM outputs using tools like middleBrick.

Llm Data Leakage in Chi with Firestore