HIGH llm data leakagefibermongodb

Llm Data Leakage in Fiber with Mongodb

Llm Data Leakage in Fiber with Mongodb — how this specific combination creates or exposes the vulnerability

When building LLM-enabled features in a Fiber application that uses MongoDB as the primary data store, data leakage can occur if application logic or prompts inadvertently expose sensitive records or schema details to the model or to end users. LLM data leakage in this context refers to situations where confidential information—such as personally identifiable information (PII), authentication tokens, or business-critical data—appears in LLM inputs, tool calls, or responses. With Fiber, developers often pass database documents or query results directly into prompts or LLM client inputs. If those documents contain fields like emails, IDs, or internal metadata and are not explicitly sanitized, the data can be exposed through model outputs, logs, or error messages.

In a typical Fiber handler, you might retrieve a user document from MongoDB and forward it to an LLM for processing. Because MongoDB documents can include nested fields and metadata (such as _id, __v, or timestamps), simply passing the raw document into a prompt can leak identifiers or internal state. For example, including a user’s _id or email in a prompt that is sent to an external LLM endpoint can violate privacy expectations and may be retained in model logs or outputs. This is especially risky when using unauthenticated LLM endpoints or when enabling features such as tool calling or function calling, where the model may request specific fields that expose sensitive structure.

Another leakage vector arises from the interaction between Fiber route handlers and MongoDB queries used for retrieval or filtering. If query filters are dynamically built from user input and passed to MongoDB without strict validation, an attacker may manipulate inputs to cause excessive data retrieval or to probe schema details through error messages or timing differences. While this is not a direct LLM issue, the retrieved data may later be supplied to an LLM, compounding the exposure. The LLM/AI Security checks in middleBrick specifically flag unauthenticated LLM endpoints and system prompt leakage, which can be relevant when LLM integrations in Fiber inadvertently expose system instructions or sensitive context that depends on MongoDB data.

Because LLM data leakage often involves subtle data flow issues, it is important to validate and sanitize data before it reaches the model. This includes removing or hashing identifiers, excluding sensitive fields, and ensuring that only necessary, non-sensitive data is included in prompts. middleBrick’s LLM/AI Security checks help detect some of these risks by scanning for system prompt leakage and unauthenticated LLM endpoints, but developers must still enforce data minimization and field-level filtering in their Fiber routes to prevent MongoDB documents from leaking into LLM contexts.

Real-world examples include a route that calls collection.Find and directly uses the result in a LangChain chain or an OpenAI client call, or a tool-calling setup where the model requests a MongoDB document’s fields. In such cases, fields like email or internal IDs can be surfaced in model outputs or logs. By combining strict field selection in MongoDB queries with prompt sanitization and output scanning, teams can reduce the likelihood of LLM data leakage in Fiber applications that rely on MongoDB.

Mongodb-Specific Remediation in Fiber — concrete code fixes

To prevent LLM data leakage when using MongoDB with Fiber, apply explicit field selection and transformation before passing data to the LLM. Avoid sending entire MongoDB documents into prompts or tool calls. Instead, construct view models that include only the fields required for the LLM task and exclude sensitive attributes such as email, password, or internal IDs.

Example: Safe document projection in a Fiber handler

Define a struct that represents only the safe fields you intend to use, and use MongoDB projections to limit the retrieved data:

//go
package handlers

import (
	"context"
	"github.com/gofiber/fiber/v2"
	"go.mongodb.org/mongo-driver/bson"
	"go.mongodb.org/mongo-driver/mongo"
)

type SafeUser struct {
	ID    string `bson:"_id" json:"id"`
	Name  string `json:"name"`
	Role  string `json:"role"`
}

func GetUserForLLM(c *fiber.Ctx) error {
	ctx := context.Background()
	collection := mongoClient.Database("appdb").Collection("users")

	var user SafeUser
	// Use projection to return only safe fields
	if err := collection.FindOne(ctx, bson.M{"_id": c.Params("id")},
		options.FindOne().SetProjection(bson.D{{"name", 1}, {"role", 1}, {"_id", 1}})).Decode(&user); err != nil {
		return c.Status(fiber.StatusInternalServerError).SendString(err.Error())
	}

	// Build prompt using only safe fields
	prompt := "Explain access for user " + user.Name + " with role " + user.Role
	// Pass prompt to LLM client here
	return c.JSON(fiber.Map{"prompt": prompt, "user": user})
}

Example: Removing sensitive fields before LLM usage

If you receive a full document, explicitly copy safe fields into a new map or struct instead of passing the raw document:

//go
package handlers

import (
	"context"
	"encoding/json"
	"github.com/gofiber/fiber/v2"
	"go.mongodb.org/mongo-driver/bson"
	"go.mongodb.org/mongo-driver/mongo"
)

func SafeLLMHandler(c *fiber.Ctx) error {
	ctx := context.Background()
	collection := mongoClient.Database("appdb").Collection("records")

	var raw bson.M
	if err := collection.FindOne(ctx, bson.M{"type": "support"}).Decode(&raw); err != nil {
		return c.Status(fiber.StatusInternalServerError).SendString(err.Error())
	}

	// Explicitly build a sanitized map
	safeData := map[string]interface{}{
		"category": raw["category"],
		"summary":  raw["summary"],
	}
	// Exclude fields like internal notes or PII before sending to LLM
	jsonData, _ := json.Marshal(safeData)
	_ = jsonData // use in prompt or LLM call

	// Example: pass safeData to LLM tools/calling logic here
	return c.JSON(fiber.Map{"safeData": safeData})
}

Example: Validating and parameterizing queries

Avoid dynamic query construction from user input. Use parameterized queries and whitelist allowed fields for projection:

//go
package handlers

import (
	"net/http"
	"github.com/gofiber/fiber/v2"
	"go.mongodb.org/mongo-driver/bson"
	"go.mongodb.org/mongo-driver/mongo"
	"go.mongodb.org/mongo-driver/mongo/options"
)

func SearchItems(c *fiber.Ctx) error {
	query := c.Query("q")
	if query == "" {
		return c.Status(fiber.StatusBadRequest).SendString("q is required")
	}

	collection := mongoClient.Database("appdb").Collection("items")
	// Use a parameterized query and limit returned fields
	cursor, err := collection.Find(context.Background(),
		bson.M{"name": bson.M{"$regex": query, "$options": "i"}},
		options.Find().SetProjection(bson.D{{"name", 1}, {"sku", 1}, {"_id", 0}}))
	if err != nil {
		return c.Status(fiber.StatusInternalServerError).SendString(err.Error())
	}
	defer cursor.Close(context.Background())
	var results []bson.M
	if err = cursor.All(context.Background(), &results); err != nil {
		return c.Status(fiber.StatusInternalServerError).SendString(err.Error())
	}

	// Only non-sensitive fields are returned and can be safely used in prompts
	return c.JSON(results)
}

These patterns ensure that only intended, non-sensitive data flows into LLM prompts, reducing the risk of LLM data leakage. Combine these practices with output scanning and prompt validation to further protect against accidental exposure.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How can I tell if my Fiber app is leaking data to an LLM?
Review your handler code to confirm that only explicitly whitelisted fields are included in prompts sent to the LLM. Use structured projection queries in MongoDB to limit returned fields, avoid passing entire documents, and scan model outputs for unintended PII or secrets using output scanning or middleware checks.
Does middleBrick detect LLM data leakage involving MongoDB fields?
middleBrick’s LLM/AI Security checks focus on prompt injection, system prompt leakage, and unauthenticated endpoints. It does not directly trace data flow between MongoDB and LLMs, so you should enforce field-level filtering and validation in your Fiber handlers to prevent sensitive MongoDB fields from reaching the model.