HIGH llm data leakagehapijavascript

Llm Data Leakage in Hapi (Javascript)

Llm Data Leakage in Hapi with Javascript — how this specific combination creates or exposes the vulnerability

Hapi is a rich-featured HTTP framework for Node.js, and JavaScript remains the primary language for route and server logic. When building endpoints that return structured data or integrate with LLM tooling, developers can inadvertently expose sensitive information through server-side variables, logs, or error messages that later appear in LLM responses. LLM Data Leakage occurs when prompts, system instructions, private business rules, or user-specific data are reflected in model outputs, enabling prompt extraction or data exfiltration via the LLM interface.

In a Hapi JavaScript service, leakage commonly arises from three vectors: route handlers that embed sensitive data into prompts, server-side debug or metadata that contaminates context passed to LLM endpoints, and overly permissive CORS or logging that exposes request/response payloads. For example, embedding user identifiers, API keys, or internal business logic directly into prompt templates can result in system prompt leakage, where patterns like ChatML or Llama 2 formatting reveal instructions that should remain confidential. middleBrick’s LLM/AI Security checks specifically target these scenarios with 27 regex patterns aligned to ChatML, Llama 2, Mistral, and Alpaca formats, plus active prompt injection probes that attempt system prompt extraction and instruction override through the LLM endpoint.

Because Hapi servers often orchestrate multiple services and enrich requests with metadata, JavaScript code that passes entire request objects or configuration maps into LLM calls heightens risk. If an unauthenticated LLM endpoint is exposed or rate limits are weak, an attacker can probe the LLM interface to infer sensitive behavior or extract training data patterns. Output scanning further detects PII, API keys, or executable code in LLM responses, ensuring that leaked secrets are identified before they propagate. Without runtime validation that aligns spec definitions (OpenAPI/Swagger 2.0, 3.0, 3.1) against actual responses, developers may not realize that enriched context is escaping into model outputs.

Using middleBrick’s CLI tool (middlebrick scan ) or Web Dashboard, teams can quickly assess their Hapi endpoints for LLM Data Leakage as part of the 12 parallel security checks. The scanner correlates spec-based definitions with runtime findings, highlighting where JavaScript route logic may inadvertently propagate sensitive context. For critical integrations, the Pro plan enables continuous monitoring and GitHub Action integration so that any new route that raises risk scores can be flagged in CI/CD before deployment.

Javascript-Specific Remediation in Hapi — concrete code fixes

Remediation focuses on strict separation between server-side context and LLM prompts, careful handling of user data, and validating outputs. Avoid embedding raw request properties, user IDs, or internal constants directly into prompt strings. Instead, use sanitized, purpose-built variables and enforce schema validation on inputs and outputs.

Example: Safe prompt construction in a Hapi route

// Safe pattern: explicit, minimal context passed to LLM
const Hapi = require('@hapi/hapi');

const buildPrompt = (userQuery, allowedScope) => {
  // Do NOT include user.id or request.headers directly in the prompt
  return {
    role: 'user',
    content: `Query: ${userQuery}\nScope: ${allowedScope}`
  };
};

const init = async () => {
  const server = Hapi.server({ port: 4000, host: 'localhost' });

  server.route({
    method: 'POST',
    path: '/ask',
    options: {
      validate: {
        payload: {
          query: Joi.string().max(200).required(),
          scope: Joi.string().valid('public', 'internal').required()
        }
      }
    },
    handler: async (request, h) => {
      const prompt = buildPrompt(request.payload.query, request.payload.scope);
      // Send only prompt to LLM endpoint; keep credentials server-side
      const llmResponse = await callLLMEndpoint([prompt]);
      return { answer: llmResponse };
    }
  });

  await server.start();
  console.log('Server running on %s', server.info.uri);
};

const callLLMEndpoint = async (messages) => {
  // Placeholder: integrate with your LLM provider securely
  return 'Sanitized response';
};

init();

Example: Preventing leakage from error objects and logs

// Avoid logging full request or server config
const logger = {
  info: (msg) => console.log(`[INFO] ${msg}`),
  audit: (event, details) => {
    // Explicitly redact sensitive fields
    const safeDetails = { ...details };
    delete safeDetails.apiKey;
    delete safeDetails.userId;
    logger.info(`${event}: ${JSON.stringify(safeDetails)}`);
  }
};

server.ext('onPreResponse', (request, h) => {
  const response = request.response;
  if (response.isBoom) {
    // Do not expose stack or internal config in responses or logs
    logger.audit('error', { statusCode: response.output.statusCode, path: request.path });
    return h.response({ error: 'Bad request' }).code(response.output.statusCode);
  }
  return h.continue;
});

Checklist for JavaScript Hapi services

  • Never concatenate user input directly into LLM prompt templates.
  • Validate and sanitize all payloads against a strict schema (Joi or similar).
  • Redact sensitive fields before logging or passing to downstream services.
  • Use environment variables for secrets; avoid attaching them to request context.
  • Apply rate limiting and authentication to LLM endpoints to reduce probing surface.

By combining disciplined prompt engineering with runtime scanning, teams can mitigate LLM Data Leakage while continuing to leverage LLM capabilities within Hapi services.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How does middleBrick detect LLM Data Leakage in Hapi JavaScript services?
middleBrick runs 27 regex patterns for ChatML, Llama 2, Mistral, and Alpaca formats alongside active prompt injection probes against your LLM endpoints. It correlates OpenAPI/Swagger definitions with runtime findings to highlight where server-side JavaScript context leaks into prompts or responses.
Can the free plan be used to assess LLM Data Leakage on a Hapi endpoint?
Yes, the free plan provides 3 scans per month, which is sufficient for initial assessments of LLM Data Leakage. For continuous monitoring of Hapi services, the Pro plan offers scheduled scans and GitHub Action integrations.