HIGH llm data leakageecho gofirestore

Llm Data Leakage in Echo Go with Firestore

Llm Data Leakage in Echo Go with Firestore — how this specific combination creates or exposes the vulnerability

When building Go services using the Echo framework that integrate with Google Cloud Firestore, improper handling of user-supplied input can lead to LLM data leakage. This occurs when application logic passes raw or minimally validated data from Firestore documents into prompts supplied to an LLM, either directly as user messages or indirectly via system instructions. Because Firestore documents often contain sensitive fields such as emails, identifiers, or internal metadata, failing to sanitize or scope that data before including it in a prompt can expose confidential information to the model and to any downstream observers of the LLM output.

In Echo Go, a typical handler might retrieve a document ID from a URL parameter, fetch the corresponding Firestore document, and then construct a prompt for an LLM without first removing or redacting sensitive keys. For example, if the document contains fields like internal_notes, owner_id, or pii_data, and these are concatenated into a user message, an LLM with verbose output or error reporting might repeat or partially reveal those values in responses. This becomes a leakage path when combined with features such as system prompt leakage detection or output scanning, which are part of LLM/AI security checks in middleBrick. The same risk is amplified if the application reuses a Firestore document as a system prompt template without validating that the stored content is safe for use in an LLM context.

The interaction with Firestore also introduces risks if query construction is driven by unchecked parameters. An attacker could manipulate document IDs or query filters to cause the application to fetch unexpected documents, then include that data in prompts. This can lead to over-retrieval of records or inclusion of documents that should remain internal, increasing the likelihood of exposing credentials, configuration, or personal data. Because Firestore rules operate at the database level and do not automatically enforce content-level sanitization for LLM usage, the application must explicitly limit which fields are used and how they are transformed before inclusion in any LLM prompt.

Furthermore, certain Firestore data patterns common in Go applications can inadvertently create structured leakage points. If nested maps or slices are serialized into prompt text without careful formatting, they may retain internal keys or paths that should not be exposed. When combined with active prompt injection testing, such as system prompt extraction or instruction override probes, these leakage paths can be triggered to reveal more than intended. middleBrick’s LLM/AI security checks highlight these risks by scanning for system prompt leakage patterns and by analyzing outputs for PII, API keys, or executable code, ensuring that Firestore-driven prompts are evaluated specifically for unintended disclosure.

Overall, the combination of Echo Go routing and Firestore document access requires disciplined data handling: treat Firestore content as untrusted input, scope queries to the minimum required fields, strip or hash sensitive identifiers before prompt construction, and validate that no internal metadata reaches the LLM. Without these controls, what begins as a convenient integration can become a channel for LLM data leakage that exposes sensitive information through model responses or tooling that observes those responses.

Firestore-Specific Remediation in Echo Go — concrete code fixes

To prevent LLM data leakage when using Firestore in Echo Go, apply strict field selection, input validation, and output-oriented safeguards. The following practices and code examples focus on minimizing exposure and ensuring that only safe, intended data reaches the LLM.

1. Select only required fields from Firestore

Avoid retrieving entire documents. Instead, explicitly select the fields you need for the LLM prompt and exclude sensitive keys.

import (
	"context"
	"github.com/labstack/echo/v4"
	"cloud.google.com/go/firestore"
)

func getSafeDocumentData(ctx context.Context, client *firestore.Client, docID string) (map[string]interface{}, error) {
	snap, err := client.Collection("articles").Doc(docID).Get(ctx)
	if err != nil {
		return nil, err
	}
	data := snap.Data()
	// Keep only fields intended for LLM usage
	safe := map[string]interface{}{
		"title": data["title"],
		"body":  data["body"],
	}
	return safe, nil
}

2. Validate and sanitize before prompt assembly

Normalize and validate values before inserting them into prompts. Remove or escape characters that could alter prompt intent.

import "strings"

func buildPrompt(title, body string) string {
	title = strings.ReplaceAll(title, "\n", " ")
	body = strings.ReplaceAll(body, "\n", " ")
	// Basic length guard to avoid overly large prompts
	if len(title)+len(body) > 2000 {
		body = body[:2000-len(title)] + "..."
	}
	return "Title: " + title + "\nContent: " + body
}

3. Avoid using raw Firestore data as system prompts

Do not directly assign Firestore-stored templates as system prompts without review. Instead, use curated, versioned prompt templates in your codebase.

const systemTemplate = "You are a helpful assistant. Summarize the following content concisely."

func handleArticle(c echo.Context) error {
	ctx := c.Request().Context()
	client, _ := firestore.NewClient(ctx, "my-project")
	defer client.Close()

	data, err := getSafeDocumentData(ctx, client, c.Param("id"))
	if err != nil {
		return c.String(500, "failed to load data")
	}

	userPrompt := buildPrompt(data["title"].(string), data["body"].(string))
	// Use systemTemplate defined in code, not from Firestore
	// sendToLLM(systemTemplate, userPrompt)
	return c.String(200, "prompt built safely")
}

4. Enforce scope and access controls at the application layer

Firestore security rules are necessary but not sufficient for LLM safety. Implement additional checks in Go to ensure the caller is authorized to view the document and that the document category is appropriate for LLM exposure.

func isAuthorizedForLLM(ctx context.Context, docSnap *firestore.DocumentSnapshot, userRole string) bool {
	visibility, ok := docSnap.Data()["visibility"].(string)
	if !ok {
		return false
	}
	// Only public or explicitly approved internal documents may be used
	if visibility == "public" || (visibility == "internal" && userRole == "trusted") {
		return true
	}
	return false
}

5. Monitor outputs and apply redaction

Treat LLM responses as potentially containing leaked information. If your architecture includes output scanning, apply redaction on sensitive patterns before logging or displaying results.

func redactSensitive(text string) string {
	// Example: remove email-like patterns
	re := regexp.MustCompile(`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`)
	return re.ReplaceAllString(text, "[REDACTED_EMAIL]")
}

By combining scoped queries, strict field selection, code-based prompt templates, and output redaction, you can safely integrate Firestore-backed data into LLM workflows in Echo Go while minimizing the risk of LLM data leakage.

Related CWEs: llmSecurity

CWE ID	Name	Severity
CWE-754	Improper Check for Unusual or Exceptional Conditions	MEDIUM

Frequently Asked Questions

How can I test if my Echo Go + Firestore setup is vulnerable to LLM data leakage?

Use active prompt injection tests such as system prompt extraction or instruction override probes against your endpoints. Review whether Firestore documents containing sensitive fields are used directly in prompts and verify that only intended, sanitized fields are included.

Does Firestore security rules alone prevent LLM data leakage in Echo Go?

No. Firestore rules control database access but do not sanitize content for LLM usage. You must explicitly select and validate fields in your Go code before using data in prompts, and avoid using raw Firestore content as system prompts.

Llm Data Leakage in Echo Go with Firestore