HIGH llm data leakageadonisjsmongodb

Llm Data Leakage in Adonisjs with Mongodb

Llm Data Leakage in Adonisjs with Mongodb — how this specific combination creates or exposes the vulnerability

When an AdonisJS application uses MongoDB as its primary datastore, large language model (LLM) integrations can inadvertently expose sensitive data through prompts, logs, or generated responses. This risk arises because developers may pass raw user input, database records, or internal context directly into LLM calls without sanitization or strict schema enforcement.

In AdonisJS, controllers often construct rich context objects from MongoDB documents (e.g., user profiles, transaction histories) and forward them to an LLM endpoint. If these documents contain fields such as internal IDs, emails, phone numbers, or personally identifiable information (PII), and those fields are included in the prompt or in tool call arguments, the LLM service may leak them in responses or logs. For example, concatenating user.email or user.ssn into a system prompt creates a direct path for data leakage if the LLM output is returned to an attacker or logged insecurely.

Moreover, AdonisJS routes that expose an unauthenticated or weakly authenticated endpoint for LLM inference can become targets for enumeration. An attacker may probe the endpoint with crafted inputs designed to trigger verbose error messages or data dumping from MongoDB. Because MongoDB documents often embed nested arrays and objects, developers might inadvertently serialize entire documents into JSON payloads sent to the LLM. This can result in overprivileged tool calls or function definitions that expose internal data structures.

The LLM/AI security checks unique to middleBrick detect these patterns by scanning for system prompt leakage across 27 regex patterns tailored to ChatML, Llama 2, Mistral, and Alpaca formats. When combined with AdonisJS routes that directly embed MongoDB document fields into prompts, the scanner can identify high-risk scenarios such as unauthenticated LLM endpoints and excessive agency patterns (e.g., tool_calls or function_call usage that permits arbitrary data exfiltration). These findings highlight the need to treat LLM integrations as part of the application attack surface, not as isolated services.

Compliance mappings are relevant here: OWASP API Top 10 API4 —2023 ‘Improper Asset Management’ and API1 —2023 ‘Broken Object Level Authorization’ align with risks where MongoDB data is exposed through LLM prompts. Similarly, frameworks such as PCI-DSS and GDPR emphasize protecting PII, which can be inadvertently surfaced through poorly constructed LLM inputs derived from MongoDB collections.

Mongodb-Specific Remediation in Adonisjs — concrete code fixes

To prevent LLM data leakage in AdonisJS applications using MongoDB, apply strict input filtering, output encoding, and schema-based field selection before constructing LLM prompts or tool calls. The following patterns demonstrate secure practices.

1. Select only necessary fields from MongoDB documents

Avoid passing entire MongoDB documents to the LLM. Instead, project only the required fields using MongoDB queries. In AdonisJS with the MongoDB driver, use project() to limit the returned document shape.

// Safe: only include id and username, exclude emails, tokens, and internal fields
const user = await User.collection
  .find({ _id: ObjectId(userid) })
  .project({ _id: 1, username: 1, email: 0, ssn: 0, tokens: 0 })
  .next();

2. Sanitize inputs used in LLM prompts

Strip or mask PII from user-controlled data before including it in system or user messages. Use a dedicated sanitizer that removes or hashes sensitive fields.

function sanitizeForLLM(input) {
  // Remove known sensitive keys
  const { email, ssn, tokens, ...safe } = input;
  return safe;
}

const safeContext = sanitizeForLLM(user.toJSON());
const prompt = `Analyze the following safe context: ${JSON.stringify(safeContext)}`;

3. Validate and restrict tool call schemas

Define strict JSON schemas for tool calls and reject any properties not explicitly allowed. This prevents overprivileged calls that could return sensitive MongoDB documents.

const toolSchema = {
  type: 'object',
  properties: {
    query: { type: 'string', minLength: 1, maxLength: 200 },
    limit: { type: 'integer', minimum: 1, maximum: 50 }
  },
  required: ['query'],
  additionalProperties: false
};

// Validate before constructing LLM tool call
const validateToolInput = (data) => {
  const { error, value } = Joi.object(toolSchema).validate(data);
  if (error) throw new Error('Invalid tool input');
  return value;
};

4. Use environment-based masking for logs and errors

Ensure that LLM request and response logging does not include raw MongoDB fields. Configure log transport to mask or omit sensitive keys.

// Example logger setup in AdonisJS
import logger from '@ioc:Adonis/Core/Logger';

const maskedLog = (label, data) => {
  const safe = { ...data };
  if (safe.body) safe.body = safe.body.toString().replace(/(\b\w{2}[\w\-\.]*@\w+\.\w{2,}\b)/g, '[EMAIL]');
  logger.info(label, safe);
};

5. Enforce authentication and rate limits on LLM endpoints

Do not expose LLM inference routes as public. Apply route-level middleware in start/routes.ts to require authentication and apply rate limiting to reduce enumeration risk.

Route.post('/llm/query', 'LLMController.ask')
  .middleware(['auth', 'rateLimiter']);

By combining these MongoDB-aware practices with continuous scanning via middleBrick — using the CLI (middlebrick scan <url>), the Web Dashboard, or the GitHub Action to fail builds on low scores — teams can reduce the likelihood of LLM data leakage and maintain compliance mappings to OWASP API Top 10 and GDPR.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How can I verify that my AdonisJS endpoints do not leak PII to LLMs?
Run a middleBrick scan against your API endpoint using the CLI (middlebrick scan <url>) or Web Dashboard. Review the LLM/AI Security section for findings on system prompt leakage, output PII, and unauthenticated endpoints, and remediate by projecting only safe fields from MongoDB.
Does middleBrick fix LLM data leakage in my AdonisJS app?
middleBrick detects and reports LLM data leakage with severity, findings, and remediation guidance. It does not automatically fix or block data exposure; you must apply the suggested schema filtering, sanitization, and route protections in your AdonisJS code.