HIGH llm data leakageadonisjscockroachdb

Llm Data Leakage in Adonisjs with Cockroachdb

Llm Data Leakage in Adonisjs with Cockroachdb — how this specific combination creates or exposes the vulnerability

AdonisJS, a Node.js web framework, often uses CockroachDB as a distributed SQL datastore. When building LLM-facing endpoints (e.g., AI chat completions or tool-use routes), developers may inadvertently expose sensitive data through LLM responses. Llm Data Leakage occurs when an AdonisJS application queries CockroachDB and returns raw or minimally processed data that includes secrets, PII, or internal schema details, and that data is then reflected in LLM outputs without appropriate safeguarding.

In this stack, risk arises at two integration points: (1) database access patterns and (2) LLM response generation. CockroachDB’s SQL interface does not inherently redact information; if AdonisJS models or queries expose columns such as emails, API keys, or internal IDs, those values can be surfaced. For example, a route like /api/chat that retrieves a user document from CockroachDB and passes the full record to an LLM prompt can leak credentials or personal data if the prompt does not explicitly exclude sensitive fields.

Additionally, metadata about the database schema can leak through error messages or verbose responses. AdonisJS may surface query validation errors or ORM trace details that reveal table structures, index names, or constraint violations. When LLM tooling logs or echoes these errors, an attacker can infer schema details that facilitate further exploitation, such as BOLA/IDOR or injection-based techniques. The LLM/AI Security checks in middleBrick specifically test for such system prompt leakage and output scanning for PII or API keys, which is critical for this combination because the LLM channel becomes an unintended data exfiltration path.

Consider an endpoint that builds a dynamic prompt from a CockroachDB row:

const Conversation = use('App/Models/Conversation')
async chat ({ request, response }) {
  const { id } = request.params()
  const conversation = await Conversation.query()
    .where('id', id)
    .with('user', userQuery => userQuery.select(['id', 'email']))
    .first()
  // Risky: entire conversation object may contain sensitive fields
  const prompt = JSON.stringify(conversation.toJSON())
  const aiResponse = await callOpenAI(prompt)
  return response.send({ reply: aiResponse })
}

If the Conversation model or related User model contains fields like api_key, ssn, or internal identifiers, they will be included in the prompt. The LLM may then echo or regurgitate these values in its responses, leading to Llm Data Leakage. middleBrick’s LLM/AI Security checks would detect such leakage by scanning LLM outputs for PII, API keys, and executable code, emphasizing the need to sanitize inputs to the LLM pipeline.

Moreover, improper error handling in AdonisJS can amplify leakage. Unhandled rejections or ORM validation failures might return stack traces that expose CockroachDB table and column names. In a CI/CD setup monitored by middleBrick’s GitHub Action, such findings would fail the build if the security score drops below the configured threshold, prompting remediation before deployment.

Cockroachdb-Specific Remediation in Adonisjs — concrete code fixes

To mitigate Llm Data Leakage when using AdonisJS with CockroachDB, adopt strict data selection, output encoding, and schema hiding practices. The goal is to ensure that only necessary, sanitized data reaches the LLM layer.

1) Select only required fields in queries

Avoid selecting entire rows. Explicitly choose columns that are necessary for the LLM task and omit sensitive attributes.

const Conversation = use('App/Models/Conversation')
async chat ({ request, response }) {
  const { id } = request.params()
  const conversation = await Conversation.query()
    .where('id', id)
    .with('user', userQuery => userQuery.select(['id', 'username'])) // exclude email, api_key
    .first()
  const safeData = {
    id: conversation.id,
    text: conversation.text,
    userId: conversation.user_id
  }
  const prompt = JSON.stringify(safeData)
  const aiResponse = await callOpenAI(prompt)
  return response.send({ reply: aiResponse })
}

2) Redact sensitive fields before prompting

Create a sanitization layer that removes or masks confidential data. For CockroachDB-backed models, explicitly filter out known sensitive columns.

function sanitizeForLlm(modelInstance) {
  const obj = modelInstance.toJSON ? modelInstance.toJSON() : {}
  const sensitive = ['api_key', 'password_hash', 'ssn', 'credit_card', 'internal_notes']
  sensitive.forEach(key => { delete obj[key] })
  return obj
}
async summarize ({ params, response }) {
  const note = await Note.find(params.id)
  const clean = sanitizeForLlm(note)
  const summary = await callOpenAI(`Summarize: ${JSON.stringify(clean)}`)
  return response.send({ summary })
}

3) Control LLM prompt construction

Design prompts that instruct the LLM not to request or echo sensitive data. Include guardrails in the system prompt when using chat completions.

const systemPrompt = [
  { role: 'system', content: 'You are a helpful assistant. Never request or reveal API keys, emails, or internal IDs. If the user asks for such data, respond with a refusal.' }
]
const userPrompt = [{ role: 'user', content: 'Summarize the following data safely.' }]
const result = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [...systemPrompt, ...userPrompt, { role: 'assistant', content: JSON.stringify(safeData) }]
})

4) Harden error handling

Ensure AdonisJS does not leak schema details through error responses. Use generic messages and avoid exposing query builder or ORM internals.

try {
  const record = await Model.query().where('id', id).firstOrFail()
} catch (error) {
  // Do not send error.originalDetails or error.stack to LLM or client
  logger.error('Data access failed', { scope: 'conversation' })
  response.status(404).send({ error: 'Not found' })
}

These CockroachDB-aware practices reduce the likelihood that sensitive data will propagate into LLM interactions. By integrating middleBrick’s CLI or GitHub Action, teams can automatically validate that such mitigations are in place and that the API’s security score remains within acceptable risk thresholds.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How can I verify that my AdonisJS endpoints are not leaking data to LLMs?
Run middleBrick scans against your API endpoints. The LLM/AI Security checks test for system prompt leakage and scan LLM outputs for PII, API keys, and executable code, providing findings and remediation guidance.
Does selecting fewer fields from CockroachDB fully prevent Llm Data Leakage?
Selecting only required fields reduces risk, but you must also sanitize inputs, control prompt construction, and harden error handling. Sensitive fields can still appear via related models or dynamic objects if not explicitly excluded.