Llm Data Leakage in Fiber with Mongodb
Llm Data Leakage in Fiber with Mongodb — how this specific combination creates or exposes the vulnerability
When building LLM-enabled features in a Fiber application that uses MongoDB as the primary data store, data leakage can occur if application logic or prompts inadvertently expose sensitive records or schema details to the model or to end users. LLM data leakage in this context refers to situations where confidential information—such as personally identifiable information (PII), authentication tokens, or business-critical data—appears in LLM inputs, tool calls, or responses. With Fiber, developers often pass database documents or query results directly into prompts or LLM client inputs. If those documents contain fields like emails, IDs, or internal metadata and are not explicitly sanitized, the data can be exposed through model outputs, logs, or error messages.
In a typical Fiber handler, you might retrieve a user document from MongoDB and forward it to an LLM for processing. Because MongoDB documents can include nested fields and metadata (such as _id, __v, or timestamps), simply passing the raw document into a prompt can leak identifiers or internal state. For example, including a user’s _id or email in a prompt that is sent to an external LLM endpoint can violate privacy expectations and may be retained in model logs or outputs. This is especially risky when using unauthenticated LLM endpoints or when enabling features such as tool calling or function calling, where the model may request specific fields that expose sensitive structure.
Another leakage vector arises from the interaction between Fiber route handlers and MongoDB queries used for retrieval or filtering. If query filters are dynamically built from user input and passed to MongoDB without strict validation, an attacker may manipulate inputs to cause excessive data retrieval or to probe schema details through error messages or timing differences. While this is not a direct LLM issue, the retrieved data may later be supplied to an LLM, compounding the exposure. The LLM/AI Security checks in middleBrick specifically flag unauthenticated LLM endpoints and system prompt leakage, which can be relevant when LLM integrations in Fiber inadvertently expose system instructions or sensitive context that depends on MongoDB data.
Because LLM data leakage often involves subtle data flow issues, it is important to validate and sanitize data before it reaches the model. This includes removing or hashing identifiers, excluding sensitive fields, and ensuring that only necessary, non-sensitive data is included in prompts. middleBrick’s LLM/AI Security checks help detect some of these risks by scanning for system prompt leakage and unauthenticated LLM endpoints, but developers must still enforce data minimization and field-level filtering in their Fiber routes to prevent MongoDB documents from leaking into LLM contexts.
Real-world examples include a route that calls collection.Find and directly uses the result in a LangChain chain or an OpenAI client call, or a tool-calling setup where the model requests a MongoDB document’s fields. In such cases, fields like email or internal IDs can be surfaced in model outputs or logs. By combining strict field selection in MongoDB queries with prompt sanitization and output scanning, teams can reduce the likelihood of LLM data leakage in Fiber applications that rely on MongoDB.
Mongodb-Specific Remediation in Fiber — concrete code fixes
To prevent LLM data leakage when using MongoDB with Fiber, apply explicit field selection and transformation before passing data to the LLM. Avoid sending entire MongoDB documents into prompts or tool calls. Instead, construct view models that include only the fields required for the LLM task and exclude sensitive attributes such as email, password, or internal IDs.
Example: Safe document projection in a Fiber handler
Define a struct that represents only the safe fields you intend to use, and use MongoDB projections to limit the retrieved data:
//go
package handlers
import (
"context"
"github.com/gofiber/fiber/v2"
"go.mongodb.org/mongo-driver/bson"
"go.mongodb.org/mongo-driver/mongo"
)
type SafeUser struct {
ID string `bson:"_id" json:"id"`
Name string `json:"name"`
Role string `json:"role"`
}
func GetUserForLLM(c *fiber.Ctx) error {
ctx := context.Background()
collection := mongoClient.Database("appdb").Collection("users")
var user SafeUser
// Use projection to return only safe fields
if err := collection.FindOne(ctx, bson.M{"_id": c.Params("id")},
options.FindOne().SetProjection(bson.D{{"name", 1}, {"role", 1}, {"_id", 1}})).Decode(&user); err != nil {
return c.Status(fiber.StatusInternalServerError).SendString(err.Error())
}
// Build prompt using only safe fields
prompt := "Explain access for user " + user.Name + " with role " + user.Role
// Pass prompt to LLM client here
return c.JSON(fiber.Map{"prompt": prompt, "user": user})
}
Example: Removing sensitive fields before LLM usage
If you receive a full document, explicitly copy safe fields into a new map or struct instead of passing the raw document:
//go
package handlers
import (
"context"
"encoding/json"
"github.com/gofiber/fiber/v2"
"go.mongodb.org/mongo-driver/bson"
"go.mongodb.org/mongo-driver/mongo"
)
func SafeLLMHandler(c *fiber.Ctx) error {
ctx := context.Background()
collection := mongoClient.Database("appdb").Collection("records")
var raw bson.M
if err := collection.FindOne(ctx, bson.M{"type": "support"}).Decode(&raw); err != nil {
return c.Status(fiber.StatusInternalServerError).SendString(err.Error())
}
// Explicitly build a sanitized map
safeData := map[string]interface{}{
"category": raw["category"],
"summary": raw["summary"],
}
// Exclude fields like internal notes or PII before sending to LLM
jsonData, _ := json.Marshal(safeData)
_ = jsonData // use in prompt or LLM call
// Example: pass safeData to LLM tools/calling logic here
return c.JSON(fiber.Map{"safeData": safeData})
}
Example: Validating and parameterizing queries
Avoid dynamic query construction from user input. Use parameterized queries and whitelist allowed fields for projection:
//go
package handlers
import (
"net/http"
"github.com/gofiber/fiber/v2"
"go.mongodb.org/mongo-driver/bson"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
)
func SearchItems(c *fiber.Ctx) error {
query := c.Query("q")
if query == "" {
return c.Status(fiber.StatusBadRequest).SendString("q is required")
}
collection := mongoClient.Database("appdb").Collection("items")
// Use a parameterized query and limit returned fields
cursor, err := collection.Find(context.Background(),
bson.M{"name": bson.M{"$regex": query, "$options": "i"}},
options.Find().SetProjection(bson.D{{"name", 1}, {"sku", 1}, {"_id", 0}}))
if err != nil {
return c.Status(fiber.StatusInternalServerError).SendString(err.Error())
}
defer cursor.Close(context.Background())
var results []bson.M
if err = cursor.All(context.Background(), &results); err != nil {
return c.Status(fiber.StatusInternalServerError).SendString(err.Error())
}
// Only non-sensitive fields are returned and can be safely used in prompts
return c.JSON(results)
}
These patterns ensure that only intended, non-sensitive data flows into LLM prompts, reducing the risk of LLM data leakage. Combine these practices with output scanning and prompt validation to further protect against accidental exposure.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |