HIGH llm data leakageecho gomongodb

Llm Data Leakage in Echo Go with Mongodb

Llm Data Leakage in Echo Go with Mongodb — how this specific combination creates or exposes the vulnerability

When building an API in Go using the Echo framework with MongoDB as the data store, developers may inadvertently expose sensitive data through both application behavior and AI integrations. Llm Data Leakage occurs when an LLM endpoint or AI-assisted feature returns or exposes data it should not, including database identifiers, user PII, or internal business logic. In the Echo + MongoDB context, this often arises when application code directly returns raw MongoDB documents or projection results to an LLM endpoint or logs LLM requests and responses without sanitization.

Echo routes can inadvertently pass MongoDB ObjectIDs or full documents into handlers that also serve LLM-related functionality, such as tool-calling endpoints or streaming chat completions. If those handlers do not enforce strict input validation and output filtering, an attacker may coax the system into returning MongoDB document structures, internal IDs, or even explain plans that reveal schema details. These artifacts can be surfaced in LLM responses when the application does not sanitize the data before it reaches the model or before the model’s output is returned to the user.

Moreover, if the application uses MongoDB change streams or aggregation pipelines that include metadata fields like database names or collection namespaces, and those fields are passed into an LLM context without redaction, the LLM may reflect that information in its responses. For example, a handler that builds a completion using user-supplied text plus an enriched MongoDB document might forward the combined context to an LLM, and the resulting reply could contain the document’s sensitive fields. This is a concrete scenario where Echo routing, MongoDB document handling, and LLM output generation intersect to create a leakage path.

Additionally, unauthenticated LLM endpoints that rely on MongoDB for context storage can become accidental data channels. If an endpoint accepts a MongoDB document ID via query or body and uses it to fetch a record that is then included in the prompt or tool description, an attacker may use prompt injection techniques to request that data back in the LLM response. middleBrick’s LLM Security checks specifically flag such unauthenticated endpoints and detect when outputs contain PII, API keys, or executable code, which can include leaked MongoDB document content.

In practice, this risk is not theoretical. Consider an Echo route that serves as both a data API and an LLM tool: a GET /users/{id} that fetches a MongoDB document and a POST /chat that incorporates the user record into a prompt. If either route omits strict output encoding or fails to redact sensitive fields, an LLM response might echo the user’s email or internal record structure. This illustrates why data flow between MongoDB and LLM components must be explicitly controlled, monitored, and sanitized to prevent inadvertent disclosure.

Mongodb-Specific Remediation in Echo Go — concrete code fixes

To mitigate Llm Data Leakage in an Echo Go application that uses MongoDB, apply strict separation between data access and LLM-facing outputs, and enforce schema-aware projections and redaction. The following practices and code examples demonstrate how to do this safely.

Use Explicit Field Projections to Avoid Returning Sensitive Data

Always define the fields you need from MongoDB and exclude sensitive fields by default. Do not rely on the full document returned by FindOne or Find.

import (
	"context"
	"go.mongodb.org/mongo-driver/bson"
	"go.mongodb.org/mongo-driver/mongo"
)

// Safe: explicit projection that excludes password hashes and internal metadata
var result struct {
	UserID   string `bson:"_id"`
	Username string `bson:"username"`
	Role     string `bson:"role"`
}

filter := bson.D{{Key: "_id", Value: userID}}
projection := bson.D{
	{Key: "username", Value: 1},
	{Key: "role", Value: 1},
	{Key: "_id", Value: 1},
	// explicitly exclude sensitive fields
	{Key: "password_hash", Value: 0},
	{Key: "email", Value: 0},
	{Key: "internal_notes", Value: 0},
}

err := usersCollection.FindOne(ctx, filter, options.FindOne().SetProjection(projection)).Decode(&result)
if err != nil {
	// handle error
}

Sanitize Data Before LLM Context Construction

When building prompts that include MongoDB data, remove or mask fields that should not be visible to the LLM or the end user. Do not concatenate raw documents into prompt text.

import (
	"strings"
)

func sanitizeForLLM(doc bson.M) string {
	// Remove keys that should never reach the LLM
	delete(doc, "password_hash")
	delete(doc, "api_key")
	delete(doc, "internal_debug")

	// Build a safe summary instead of dumping the full document
	var parts []string
	if username, ok := doc["username"].(string); ok {
		parts = append(parts, "username: "+username)
	}
	if email, ok := doc["email"].(string); ok {
		parts = append(parts, "email: "+email)
	}
	return strings.Join(parts, ", ")
}

// Usage in an Echo handler
safeContext := sanitizeForLLM(mongoDoc)
prompt := "Answer the user query using the following context: " + safeContext

Protect LLM Endpoints That Accept MongoDB Identifiers

If an endpoint uses a MongoDB ObjectID or query parameter to fetch context for an LLM request, validate and scope that input. Do not allow arbitrary identifiers to drive database queries without authorization checks.

func getUserSummary(c echo.Context) error {
	userID := c.Param("id")
	// Validate ObjectID format before using it
	if !isValidObjectID(userID) {
		return echo.NewHTTPError(http.StatusBadRequest, "invalid user identifier")
	}

	// Apply authorization: ensure the requesting user can view this record
	if !authorizeView(c.Request().Context(), userID, c.Get("subject")) {
		return echo.NewHTTPError(http.StatusForbidden, "access denied")
	}

	var summary struct {
		Username string `bson:"username"`
		Role     string `bson:"role"`
	}
	filter := bson.D{{Key: "_id", Value: userID}}
	projection := bson.D{
		{Key: "username", Value: 1},
		{Key: "role", Value: 1},
	}
	err := usersCollection.FindOne(c.Request().Context(), filter, options.FindOne().SetProjection(projection)).Decode(&summary)
	if err != nil {
		return echo.NewHTTPError(http.StatusInternalServerError, "unable to fetch user")
	}

	// Return only safe, intended fields; do not forward the full document to LLM logic
	return c.JSON(http.StatusOK, map[string]string{
		"username": summary.Username,
		"role"      : summary.Role,
	})
}

Audit and Redact LLM Outputs That May Contain MongoDB Artifacts

Before returning LLM responses to users, scan for patterns that resemble database identifiers, connection strings, or collection names. middleBrick’s output scanning can help detect PII, API keys, and code fragments that may include MongoDB-specific leakage.

// Example placeholder for post-processing an LLM response
func redactLLMOutput(text string) string {
	// Simple redaction of MongoDB ObjectID-like strings
	oidRegex := regexp.MustCompile(`[a-f0-9]{24}`)
	text = oidRegex.ReplaceAllString(text, "<redacted-id>")
	// Add additional patterns as needed
	return text
}

By combining explicit projections, input validation, output sanitization, and redaction, you can significantly reduce the risk that Echo Go routes interacting with MongoDB will inadvertently leak data through LLM workflows or other dynamic outputs.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How does Echo route design affect MongoDB data exposure to LLMs?
Echo routes that directly return MongoDB documents or accept raw ObjectIDs without projection or authorization can feed sensitive data into LLM prompts or tool descriptions, increasing the risk of leakage in model outputs.
Can middleBrick detect MongoDB-related data leakage in LLM responses?
Yes, middleBrick’s LLM Security checks include output scanning for PII, API keys, and code patterns that can reveal MongoDB document content or identifiers in LLM responses.