Llm Data Leakage in Hapi with Mongodb
Llm Data Leakage in Hapi with Mongodb — how this specific combination creates or exposes the vulnerability
When an Hapi server uses MongoDB as a backend, LLM data leakage can occur if application logic inadvertently exposes sensitive data through prompts, responses, or error messages that are visible to an LLM endpoint or logged outputs. In this combination, developers may construct prompts from user-supplied data or database records and send them to an LLM without sanitization. If prompts include database identifiers, PII, or authentication tokens pulled from MongoDB documents, that information can be revealed in LLM responses, in server logs, or via tool-calling traces.
MongoDB documents often contain fields such as email, password hashes, API keys, or internal IDs. If an Hapi route builds a system prompt like User ${user.name} (id: ${user._id}) says: ${input} and the user document contains sensitive metadata, that data can leak into the prompt. Similarly, if LLM-related debugging or telemetry captures full request/response payloads—including MongoDB query results—credential material or PII can appear in LLM outputs or logs, especially when the LLM endpoint is unauthenticated or overly permissive in agent configurations.
Another leakage vector is improper handling of tool definitions and function schemas. An Hapi route that dynamically generates tool schemas from MongoDB collection structures might expose field names, sample values, or internal business logic to an LLM. For example, returning a sample user document as part of a tool description can reveal database structure or real data values. This aligns with the LLM/AI Security checks in middleBrick, which scan for system prompt leakage, active prompt injection (system prompt extraction, data exfiltration probes), output scanning for PII and API keys, and unauthenticated LLM endpoint detection.
In practice, an attacker may probe the Hapi application with crafted inputs designed to trigger verbose error messages or debug endpoints that echo MongoDB query details. If the application forwards these messages to an LLM or includes them in tool call arguments, sensitive data can be exfiltrated through the LLM channel. middleBrick’s LLM/AI Security checks specifically test for these patterns by examining how LLM endpoints handle injected prompts and by scanning outputs for credentials and PII to highlight such leakage risks.
Mongodb-Specific Remediation in Hapi — concrete code fixes
To mitigate LLM data leakage in Hapi with MongoDB, ensure that data flowing into LLM prompts is carefully filtered and that MongoDB documents are not used raw in LLM-facing contexts. Apply strict schema selection, avoid including sensitive fields in prompts, and sanitize all outputs and logs. Below are concrete code examples demonstrating secure patterns.
1. Explicitly project only safe fields when querying MongoDB
// Safe projection in Hapi handler
const safeUserFields = { name: 1, email: 1, role: 1, _id: 1 };
const user = await db.collection('users').findOne({ _id: userId }, { projection: safeUserFields });
if (!user) { throw Boom.notFound('User not found'); }
By using projection, you prevent sensitive fields such as passwordHash, apiKey, or internal metadata from being read and later exposed in prompts or logs.
2. Build prompts from sanitized data only
// Construct prompt from whitelisted fields
const systemPrompt = `User intent: ${sanitizeInput(userInput)}`;
// Do NOT include user._id or raw MongoDB document fields
const response = await callOpenAI({ messages: [{ role: 'user', content: systemPrompt }] });
Never interpolate MongoDB document contents directly into prompts. Use a sanitization helper to strip or encode characters that could enable prompt injection or data exfiltration.
3. Remove PII and sensitive values from tool definitions
// Return only safe metadata for tool schemas
const toolDefinition = {
type: 'function',
function: {
name: 'get_user_info',
description: 'Retrieve public profile fields for a user',
parameters: {
type: 'object',
properties: {
userId: { type: 'string', description: 'Public user identifier' }
},
required: ['userId']
}
}
};
Do not include sample documents or internal field descriptions that may contain real data. Validate and limit tool descriptions to non-sensitive metadata.
4. Harden error handling to avoid leaking database details
// Avoid exposing MongoDB errors to LLM or client
try {
await db.collection('users').updateOne({ _id: userId }, { $set: updates });
} catch (err) {
console.error('Update failed'); // Log safely without stack or query details
throw Boom.internal('An error occurred');
}
Ensure errors do not contain MongoDB operation details, query structures, or stack traces that could be captured by logging mechanisms or forwarded to LLM tooling.
5. Secure LLM endpoint access and disable unnecessary agent features
// Example: Require authentication before calling LLM and restrict tool usage
const callSecureLLM = async (messages, tools = []) => {
if (!isValidSession()) { throw Boom.unauthorized(); }
const limitedTools = tools.filter(t => t.name !== 'dangerous_tool');
return await openai.chat.completions.create({ model: 'gpt-4', messages, tools: limitedTools });
};
Apply authentication checks before invoking LLM endpoints and restrict tool sets to prevent unauthorized data exfiltration or excessive agency.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |