HIGH llm data leakageexpressmongodb

Llm Data Leakage in Express with Mongodb

Llm Data Leakage in Express with Mongodb — how this specific combination creates or exposes the vulnerability

When an Express application uses MongoDB and exposes endpoints that interact with language model (LLM) services, there is a risk of LLM data leakage through improper handling of prompts, responses, and data flow. In this stack, sensitive information can be inadvertently exposed to the LLM or returned to the client via LLM interactions.

Express routes often construct dynamic inputs for LLM calls using data from request parameters, headers, cookies, or parsed bodies. If this data includes personally identifiable information (PII), authentication tokens, or internal identifiers and is passed into a prompt without validation or sanitization, the LLM service may echo it back in responses. For example, a route that builds a user query by concatenating raw user input into a system or user message can leak credentials or session tokens if the LLM reflects that content in its output.

MongoDB operations in Express typically involve querying a database using user-supplied filters or constructing aggregation pipelines that may reference fields such as userId, email, or role. If an LLM is used to generate or transform queries, or if LLM responses are used to influence database operations, sensitive data can travel across system boundaries. For instance, an LLM might return a JSON structure that inadvertently contains database field names or values that should not be exposed. Similarly, error messages from MongoDB (e.g., validation or connection errors) can be passed into LLM prompts, leading to indirect data leakage through the LLM’s output.

The LLM/AI Security checks in middleBrick specifically test for system prompt leakage and output scanning for PII, API keys, and executable code. In an Express + MongoDB context, this means validating that prompts built from database results do not contain secrets, and ensuring LLM responses do not echo sensitive fields stored in MongoDB documents. Without these safeguards, an API could expose internal data model structures or confidential information through conversational behavior.

Additionally, improper configuration of the LLM client in Express can lead to unauthenticated endpoint exposure or excessive agency, where tool-calling patterns or function schemas reference MongoDB collections or internal routes. middleBrick’s detection of unauthenticated LLM endpoints and excessive agency helps identify scenarios where an Express API might unintentionally allow LLMs to interact with MongoDB-derived data models without appropriate access controls.

Mongodb-Specific Remediation in Express — concrete code fixes

To prevent LLM data leakage in an Express application using MongoDB, implement strict input validation, output filtering, and separation between data retrieval and LLM interaction. The following examples demonstrate secure patterns.

First, ensure that user input used to build prompts is sanitized and that sensitive fields are removed before sending data to the LLM. Use explicit projection to limit the fields returned from MongoDB.

const { MongoClient } = require('mongodb');
const express = require('express');
const router = express.Router();

const uri = 'mongodb://localhost:27017';
const client = new MongoClient(uri);

router.get('/user/:id', async (req, res) => {
  try {
    await client.connect();
    const database = client.db('appdb');
    const users = database.collection('users');

    // Explicitly limit returned fields to avoid exposing sensitive data to downstream processing
    const user = await users.findOne(
      { _id: req.params.id },
      { projection: { name: 1, email: 1, role: 1, _id: 0 } }
    );

    if (!user) {
      return res.status(404).json({ error: 'User not found' });
    }

    // Safe: user contains only intended fields
    res.json(user);
  } catch (err) {
    console.error(err);
    res.status(500).json({ error: 'Internal server error' });
  } finally {
    await client.close();
  }
});

module.exports = router;

Second, when integrating with LLMs, construct prompts using sanitized values and avoid passing raw database documents. Use environment variables for LLM configuration and validate that responses do not contain unexpected data patterns.

const express = require('express');
const { OpenAI } = require('openai');
const router = express.Router();

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

router.post('/chat', async (req, res) => {
  const { userId, message } = req.body;

  // Retrieve only necessary, non-sensitive user data
  const userData = await getUserPublicData(userId);

  // Build prompt using sanitized data
  const prompt = `You are a helpful assistant for user ${userData.displayName}. Message: ${message}`;

  try {
    const completion = await client.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [{ role: 'user', content: prompt }],
    });

    // Basic output scan: reject responses containing API keys or PII patterns
    const response = completion.choices[0]?.message?.content || '';
    if (/(api_key|token|secret)/i.test(response)) {
      return res.status(400).json({ error: 'Response contains sensitive patterns' });
    }

    res.json({ reply: response });
  } catch (err) {
    console.error(err);
    res.status(500).json({ error: 'LLM request failed' });
  }
});

async function getUserPublicData(userId) {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();
  const user = await client.db('appdb')
    .collection('users')
    .findOne({ _id: userId }, { projection: { displayName: 1, preferences: 1, _id: 0 } });
  await client.close();
  return user || { displayName: 'User' };
}

Finally, apply consistent error handling that avoids exposing stack traces or database details to LLM-related operations, and configure the LLM client to reject tool calls that reference internal collections unless explicitly allowed.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How can I verify that my Express routes are not leaking sensitive fields to the LLM?
Use middleBrick’s output scanning to check LLM responses for PII, API keys, and database field names. Combine with code reviews that ensure MongoDB projections limit returned fields and that prompts are built from sanitized data only.
What should I do if an LLM response contains an API key that was stored in MongoDB?
Immediately rotate the compromised key, audit access logs for the MongoDB collection and LLM usage, and implement output validation rules to reject or sanitize responses containing key-like patterns before they reach users or downstream systems.