HIGH llm data leakagechicockroachdb

Llm Data Leakage in Chi with Cockroachdb

Llm Data Leakage in Chi with Cockroachdb — how this specific combination creates or exposes the vulnerability

When an AI assistant in Chi interacts with a Cockroachdb-backed service, the risk of LLM data leakage centers on how prompts, queries, and responses are handled before they reach the database. middleBrick’s LLM/AI Security checks specifically look for system prompt leakage, unsafe tool usage, and exposure of sensitive data in model outputs. In a Chi application that uses Cockroachdb as the primary store, a typical flow might involve constructing dynamic SQL from user input, passing contextual data to an LLM, and returning database results or generated text to the client.

If the application embeds sensitive values—such as tenant identifiers, personal data, or internal query logic—into prompts that are sent to an unauthenticated or poorly secured LLM endpoint, middleBrick’s system prompt leakage detection (27 regex patterns) and active prompt injection testing can reveal whether those values are echoed back in model responses. For example, consider a Chi route that builds a prompt like:

const prompt = `Tenant ${tenantId} user preferences: ${userInput}. Generate a summary.`;
const response = await openai.chat.completions.create({ model: 'gpt-3.5-turbo', messages: [{ role: 'user', content: prompt }] });

If tenantId or other Cockroachdb row-level sensitive data is included verbatim in the prompt, an attacker who can inject or influence the LLM may extract it via a jailbreak or data exfiltration probe. middleBrick’s active prompt injection testing runs sequential probes (system prompt extraction, instruction override, DAN jailbreak, data exfiltration, cost exploitation) to surface such weaknesses. Additionally, if the application returns raw Cockroachdb query results directly to the LLM or to the user without filtering, PII, API keys, or executable code could appear in model outputs; output scanning in the LLM/AI Security checks detects these patterns.

The interaction with Cockroachdb can also expose leakage through error messages or verbose logs. For instance, a malformed query or a failed authorization check might produce detailed errors that include SQL snippets or table structures. If those errors are passed to an LLM for debugging or included in responses, sensitive schema information can leak. middleBrick scans unauthenticated attack surfaces, so even endpoints that do not require credentials are evaluated for data exposure. By correlating OpenAPI/Swagger specs (with full $ref resolution) against runtime findings, middleBrick maps where LLM-related operations intersect with Cockroachdb data paths, highlighting high-risk surfaces such as unauthenticated endpoints that return database content to LLM calls.

In Chi, a typical integration might look like a handler that queries Cockroachdb and then asks an LLM to rephrase results. If the handler does not sanitize inputs or enforce strict schema-based authorization, the LLM can be tricked into revealing underlying query logic or data. middleBrick’s BOLA/IDOR and Property Authorization checks help identify whether object-level permissions are enforced before data is exposed to the LLM. Without these controls, an attacker may manipulate identifiers to access other tenants’ records and have the LLM inadvertently echo back confidential rows.

Finally, excessive agency is a concern when LLM tooling in Chi generates or passes function calls that interact with Cockroachdb. middleBrick’s detection of tool_calls, function_call, and LangChain agent patterns aims to uncover scenarios where an LLM could trigger unintended database operations. Ensuring that only vetted, schema-compliant queries reach Cockroachdb—and that LLM outputs are stripped of sensitive artifacts before being returned—is essential to prevent data leakage across the Chi application stack.

Cockroachdb-Specific Remediation in Chi — concrete code fixes

To mitigate LLM data leakage when using Cockroachdb in Chi, focus on strict input validation, schema-aware authorization, and safe handling of LLM inputs and outputs. Below are concrete code examples tailored to Chi and Cockroachdb that align with middleBrick’s findings and recommended remediation guidance.

1. Parameterized queries and strict schema validation

Always use parameterized statements to prevent SQL injection and avoid embedding raw values in prompts. In Chi, leverage the database/sql interface with Cockroachdb’s native driver or an ORM that supports prepared statements.

// Chi route using parameterized query
app.get('/api/users/:id', async (req, res) => {
  const { id } = req.params;
  const query = 'SELECT display_name, preferences FROM user_prefs WHERE tenant_id = $1 AND user_id = $2';
  const params = [req.tenant.id, id];
  const result = await db.query(query, params);
  if (result.rows.length === 0) {
    return res.status(404).json({ error: 'not_found' });
  }
  // Safe: only non-sensitive fields are forwarded to LLM
  const safeData = result.rows.map(r => ({ display_name: r.display_name }));
  res.json(safeData);
});

2. Prompt sanitization and output filtering

Never include raw database values in LLM prompts. If you must provide context, use abstracted, non-sensitive representations. Also, filter LLM outputs to remove potential PII or secrets before sending responses.

// Build safe prompts without Cockroachdb-sensitive fields
const buildPrompt = (userInput) => {
  // Use abstracted context instead of raw data
  return `Summarize preferences based on the provided abstract context: ${userInput}. Do not include identifiers.`;
};

// Filter LLM outputs for accidental leakage
const filterOutput = (text) => {
  // Basic example: redact potential keys and emails
  return text
    .replace(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g, '[email]')
    .replace(/\b[a-zA-Z0-9_-]{20,}\b/g, '[redacted]');
};

app.post('/api/summarize', async (req, res) => {
  const prompt = buildPrompt(req.body.context);
  const response = await openai.chat.completions.create({ model: 'gpt-3.5-turbo', messages: [{ role: 'user', content: prompt }] });
  const safeText = filterOutput(response.choices[0]?.message?.content || '');
  res.json({ summary: safeText });
});

3. Enforce Row-Level Security (RLS) and tenant checks

Cockroachdb supports row-level security policies. Define policies that restrict rows by tenant_id and ensure every query respects the current tenant context. In Chi, validate tenant membership before any database interaction and avoid dynamic tenant identifiers in prompts.

-- Cockroachdb RLS example
ALTER TABLE user_prefs ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON user_prefs
  USING (tenant_id = current_setting('app.current_tenant', true)::uuid);

-- Chi middleware sets the tenant context
app.use((req, res, next) => {
  const tenantId = req.tenant.id; // validated earlier
  req.dbQuery = (text, params) => db.query(text, [...params, tenantId]);
  next();
});

// Safe query execution within tenant context
app.get('/api/prefs', async (req, res) => {
  const result = await req.dbQuery('SELECT * FROM user_prefs WHERE user_id = $1', [req.user.id]);
  res.json(result.rows);
});

4. Limit LLM scope and disable dangerous tooling

Configure your Chi application to avoid exposing tool_calls or function_call patterns that directly execute Cockroachdb operations unless absolutely necessary. When LLM integration is required, use read-only database roles and disable automatic function execution.

// Example: invoke LLM without enabling autonomous tool calls
const chatParams = {
  model: 'gpt-3.5-turbo',
  messages: [{ role: 'user', content: 'Explain the user preferences.' }],
  tools: [], // explicitly disable tools
};
const response = await openai.chat.completions.create(chatParams);
// Further process response safely

5. Monitor and redact logs

Ensure that logs and error messages related to Cockroachdb do not include sensitive data or SQL fragments that could aid an attacker. Use structured logging with field-level redaction in Chi and validate that LLM-related outputs are audited before being stored or displayed.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

Can LLM output contain leaked Cockroachdb values even if the database is not directly exposed?
Yes. If prompts or intermediate data include raw Cockroachdb values and are sent to an LLM, the model may reproduce them in its output. Use output scanning and prompt sanitization to prevent this.
Does enabling RLS in Cockroachdb fully prevent LLM data leakage?
RLS enforces row-level access control, but leakage can still occur if application logic passes unauthorized context to the LLM or returns unfiltered query results. Combine RLS with strict input validation and safe prompt handling.