HIGH llm data leakagehapijwt tokens

Llm Data Leakage in Hapi with Jwt Tokens

Llm Data Leakage in Hapi with Jwt Tokens — how this specific combination creates or exposes the vulnerability

LLM data leakage in a Hapi application that uses JWT tokens occurs when an endpoint that is authenticated via JWT inadvertently exposes sensitive token contents or related runtime data to an LLM service or through LLM-related functionality. Because JWTs often carry identity, roles, scopes, or session metadata in their payload, combining them with LLM integrations can create inadvertent channels for data exfiltration.

Consider a Hapi route that decodes a JWT to attach user information to the request and then passes user-specific data to an LLM for natural language processing. If the server-side code includes the full token or sensitive claims in prompts, system messages, or tool call parameters, an attacker who can influence the LLM input (through prompt injection or crafted inputs) may cause the LLM to reveal those details in its output. For example, embedding the JWT payload directly into a prompt such as User {userId} with scopes {scopes} requests action X and then asking the LLM to summarize the request can lead to leakage if the LLM echoes back the supplied context.

Additionally, if the LLM endpoint is unauthenticated or weakly guarded, and the Hapi service uses JWTs only for its own API authentication (not for the LLM call), there is a risk of insecure consumption: the LLM service might receive tokens or user context through indirect parameters, logs, or error messages. System prompt leakage detection is designed to catch patterns where JWT-like structures or authorization headers appear in prompts that should remain internal. Active prompt injection testing can reveal whether crafted inputs cause the LLM to output JWT contents, session identifiers, or role information that should never be exposed.

Output scanning for PII, API keys, and executable code is essential in this context because an LLM response that includes a JWT or a derived token represents a data exfiltration event. Excessive agency detection is also relevant: if the Hapi integration uses tool calls or function calling patterns that include JWT-bearing headers as tool arguments, an agentic LLM might retain or repeat those values in multi-step interactions. Unauthenticated LLM endpoint detection helps identify scenarios where the LLM service is reachable without proper controls, increasing the chance that leaked JWT data can be harvested by an external party.

In practice, this specific combination is risky when developers propagate JWT claims into LLM inputs without sanitization, when logs or error messages echo LLM outputs that contain tokens, or when the LLM is used to generate or transform authorization headers. Because middleBrick tests for system prompt leakage, prompt injection chains, and output exposure, it can surface these LLM data leakage vectors that arise from poor handling of JWT tokens in Hapi integrations.

Jwt Tokens-Specific Remediation in Hapi — concrete code fixes

Remediation focuses on ensuring JWT tokens and their claims never flow into LLM prompts, tool calls, or logs, and that the LLM integration follows the principle of least privilege with respect to user data.

1. Never include raw JWTs or full payload claims in prompts. Instead, extract only the minimal required attributes and validate them before use.

// Safe extraction in Hapi handler
function extractSafeUserContext(decoded) {
  return {
    userId: decoded.sub,
    role: decoded.role,
    tenant: decoded.tenant
  };
}

// Usage in route handler
server.route({
  method: 'POST',
  path: '/process',
  options: {
    auth: {
      strategy: 'jwt',
      mode: 'required'
    }
  },
  handler: async (request, h) => {
    const safeContext = extractSafeUserContext(request.auth.credentials.decoded);
    // Do NOT pass request.auth.credentials.token or decoded to LLM
    const llmResponse = await callLlm({
      prompt: `Process request for user ${safeContext.userId} with role ${safeContext.role}`
    });
    return h.response(llmResponse);
  }
});

2. Sanitize and scope data before LLM consumption. Use strict allowlists for claims and avoid concatenating tokens into messages that become part of the prompt or tool arguments.

// Sanitization and strict schema validation
const Joi = require('joi');
const userContextSchema = Joi.object({
  userId: Joi.string().required(),
  role: Joi.string().valid('admin', 'user', 'guest').required(),
  tenant: Joi.string().alphanum().required()
});

function sanitizeForLlm(decoded) {
  const { error, value } = userContextSchema.validate({
    userId: decoded.sub,
    role: decoded.role,
    tenant: decoded.tenant
  });
  if (error) throw Boom.badRequest('Invalid token claims');
  return value;
}

3. Isolate LLM calls from authentication state. Configure your Hapi server so that the LLM integration uses a separate outbound context that does not automatically forward authorization headers or tokens.

// Example: explicit no-token fetch for LLM endpoint
const fetch = require('node-fetch');

async function callLlm(messages) {
  const response = await fetch('https://api.example.com/llm/complete', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json'
      // Do NOT include Authorization header derived from JWT unless explicitly required and scoped
    },
    body: JSON.stringify({ messages })
  });
  const data = await response.json();
  return data;
}

4. Implement output scanning and logging hygiene. Ensure LLM outputs are inspected for accidental token echoes before returning responses to the client, and avoid logging full JWTs or decoded payloads.

// Basic guard before logging or returning LLM output
function safeLogAndReturn(response) {
  if (typeof response === 'string' && (response.includes('eyJ') || response.includes('Bearer'))) {
    // Potential token leakage; redact before logging
    console.warn('LLM output may contain sensitive token data');
    return h.response({ error: 'Internal error' }).code(500);
  }
  return h.response(response);
}

5. Use middleware to enforce that JWT claims are not inadvertently attached to LLM-related request properties. This reduces the surface for accidental propagation in tool calls or generated function arguments.

// Hapi extensibility point to clean sensitive data
server.ext('onPreResponse', (request, h) => {
  const response = request.response;
  if (response.variety === 'result' && response.source && response.source.payload) {
    // Ensure no JWT token is echoed in responses that reach the client
    if (typeof response.source.payload === 'string' && /eyJ[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+/.test(response.source.payload)) {
      return h.response({ error: 'Data leakage prevented' }).code(500);
    }
  }
  return h.continue;
});

Apply these practices alongside middleBrick’s checks for system prompt leakage, active prompt injection testing, and output scanning to reduce the risk of LLM data leakage when JWT tokens are present in the application flow.

Related CWEs: llmSecurity

CWE ID	Name	Severity
CWE-754	Improper Check for Unusual or Exceptional Conditions	MEDIUM

Frequently Asked Questions

Can middleBrick detect LLM outputs that contain JWT tokens?

Yes, middleBrick’s output scanning checks for PII, API keys, and patterns resembling JWT tokens (e.g., base64payload.base64header.base64signature) in LLM responses to help identify accidental data leakage.

Does using JWTs in Hapi automatically expose tokens to LLMs?

Not automatically; exposure occurs when JWT claims or the token string are explicitly included in LLM prompts, tool call arguments, or logged outputs. Proper isolation and sanitization prevent leakage.

Llm Data Leakage in Hapi with Jwt Tokens