Llm Data Leakage in Hapi (Javascript)
Llm Data Leakage in Hapi with Javascript — how this specific combination creates or exposes the vulnerability
Hapi is a rich-featured HTTP framework for Node.js, and JavaScript remains the primary language for route and server logic. When building endpoints that return structured data or integrate with LLM tooling, developers can inadvertently expose sensitive information through server-side variables, logs, or error messages that later appear in LLM responses. LLM Data Leakage occurs when prompts, system instructions, private business rules, or user-specific data are reflected in model outputs, enabling prompt extraction or data exfiltration via the LLM interface.
In a Hapi JavaScript service, leakage commonly arises from three vectors: route handlers that embed sensitive data into prompts, server-side debug or metadata that contaminates context passed to LLM endpoints, and overly permissive CORS or logging that exposes request/response payloads. For example, embedding user identifiers, API keys, or internal business logic directly into prompt templates can result in system prompt leakage, where patterns like ChatML or Llama 2 formatting reveal instructions that should remain confidential. middleBrick’s LLM/AI Security checks specifically target these scenarios with 27 regex patterns aligned to ChatML, Llama 2, Mistral, and Alpaca formats, plus active prompt injection probes that attempt system prompt extraction and instruction override through the LLM endpoint.
Because Hapi servers often orchestrate multiple services and enrich requests with metadata, JavaScript code that passes entire request objects or configuration maps into LLM calls heightens risk. If an unauthenticated LLM endpoint is exposed or rate limits are weak, an attacker can probe the LLM interface to infer sensitive behavior or extract training data patterns. Output scanning further detects PII, API keys, or executable code in LLM responses, ensuring that leaked secrets are identified before they propagate. Without runtime validation that aligns spec definitions (OpenAPI/Swagger 2.0, 3.0, 3.1) against actual responses, developers may not realize that enriched context is escaping into model outputs.
Using middleBrick’s CLI tool (middlebrick scan
Javascript-Specific Remediation in Hapi — concrete code fixes
Remediation focuses on strict separation between server-side context and LLM prompts, careful handling of user data, and validating outputs. Avoid embedding raw request properties, user IDs, or internal constants directly into prompt strings. Instead, use sanitized, purpose-built variables and enforce schema validation on inputs and outputs.
Example: Safe prompt construction in a Hapi route
// Safe pattern: explicit, minimal context passed to LLM
const Hapi = require('@hapi/hapi');
const buildPrompt = (userQuery, allowedScope) => {
// Do NOT include user.id or request.headers directly in the prompt
return {
role: 'user',
content: `Query: ${userQuery}\nScope: ${allowedScope}`
};
};
const init = async () => {
const server = Hapi.server({ port: 4000, host: 'localhost' });
server.route({
method: 'POST',
path: '/ask',
options: {
validate: {
payload: {
query: Joi.string().max(200).required(),
scope: Joi.string().valid('public', 'internal').required()
}
}
},
handler: async (request, h) => {
const prompt = buildPrompt(request.payload.query, request.payload.scope);
// Send only prompt to LLM endpoint; keep credentials server-side
const llmResponse = await callLLMEndpoint([prompt]);
return { answer: llmResponse };
}
});
await server.start();
console.log('Server running on %s', server.info.uri);
};
const callLLMEndpoint = async (messages) => {
// Placeholder: integrate with your LLM provider securely
return 'Sanitized response';
};
init();
Example: Preventing leakage from error objects and logs
// Avoid logging full request or server config
const logger = {
info: (msg) => console.log(`[INFO] ${msg}`),
audit: (event, details) => {
// Explicitly redact sensitive fields
const safeDetails = { ...details };
delete safeDetails.apiKey;
delete safeDetails.userId;
logger.info(`${event}: ${JSON.stringify(safeDetails)}`);
}
};
server.ext('onPreResponse', (request, h) => {
const response = request.response;
if (response.isBoom) {
// Do not expose stack or internal config in responses or logs
logger.audit('error', { statusCode: response.output.statusCode, path: request.path });
return h.response({ error: 'Bad request' }).code(response.output.statusCode);
}
return h.continue;
});
Checklist for JavaScript Hapi services
- Never concatenate user input directly into LLM prompt templates.
- Validate and sanitize all payloads against a strict schema (Joi or similar).
- Redact sensitive fields before logging or passing to downstream services.
- Use environment variables for secrets; avoid attaching them to request context.
- Apply rate limiting and authentication to LLM endpoints to reduce probing surface.
By combining disciplined prompt engineering with runtime scanning, teams can mitigate LLM Data Leakage while continuing to leverage LLM capabilities within Hapi services.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |