Llm Data Leakage in Chi with Cockroachdb
Llm Data Leakage in Chi with Cockroachdb — how this specific combination creates or exposes the vulnerability
When an AI assistant in Chi interacts with a Cockroachdb-backed service, the risk of LLM data leakage centers on how prompts, queries, and responses are handled before they reach the database. middleBrick’s LLM/AI Security checks specifically look for system prompt leakage, unsafe tool usage, and exposure of sensitive data in model outputs. In a Chi application that uses Cockroachdb as the primary store, a typical flow might involve constructing dynamic SQL from user input, passing contextual data to an LLM, and returning database results or generated text to the client.
If the application embeds sensitive values—such as tenant identifiers, personal data, or internal query logic—into prompts that are sent to an unauthenticated or poorly secured LLM endpoint, middleBrick’s system prompt leakage detection (27 regex patterns) and active prompt injection testing can reveal whether those values are echoed back in model responses. For example, consider a Chi route that builds a prompt like:
const prompt = `Tenant ${tenantId} user preferences: ${userInput}. Generate a summary.`;
const response = await openai.chat.completions.create({ model: 'gpt-3.5-turbo', messages: [{ role: 'user', content: prompt }] });
If tenantId or other Cockroachdb row-level sensitive data is included verbatim in the prompt, an attacker who can inject or influence the LLM may extract it via a jailbreak or data exfiltration probe. middleBrick’s active prompt injection testing runs sequential probes (system prompt extraction, instruction override, DAN jailbreak, data exfiltration, cost exploitation) to surface such weaknesses. Additionally, if the application returns raw Cockroachdb query results directly to the LLM or to the user without filtering, PII, API keys, or executable code could appear in model outputs; output scanning in the LLM/AI Security checks detects these patterns.
The interaction with Cockroachdb can also expose leakage through error messages or verbose logs. For instance, a malformed query or a failed authorization check might produce detailed errors that include SQL snippets or table structures. If those errors are passed to an LLM for debugging or included in responses, sensitive schema information can leak. middleBrick scans unauthenticated attack surfaces, so even endpoints that do not require credentials are evaluated for data exposure. By correlating OpenAPI/Swagger specs (with full $ref resolution) against runtime findings, middleBrick maps where LLM-related operations intersect with Cockroachdb data paths, highlighting high-risk surfaces such as unauthenticated endpoints that return database content to LLM calls.
In Chi, a typical integration might look like a handler that queries Cockroachdb and then asks an LLM to rephrase results. If the handler does not sanitize inputs or enforce strict schema-based authorization, the LLM can be tricked into revealing underlying query logic or data. middleBrick’s BOLA/IDOR and Property Authorization checks help identify whether object-level permissions are enforced before data is exposed to the LLM. Without these controls, an attacker may manipulate identifiers to access other tenants’ records and have the LLM inadvertently echo back confidential rows.
Finally, excessive agency is a concern when LLM tooling in Chi generates or passes function calls that interact with Cockroachdb. middleBrick’s detection of tool_calls, function_call, and LangChain agent patterns aims to uncover scenarios where an LLM could trigger unintended database operations. Ensuring that only vetted, schema-compliant queries reach Cockroachdb—and that LLM outputs are stripped of sensitive artifacts before being returned—is essential to prevent data leakage across the Chi application stack.
Cockroachdb-Specific Remediation in Chi — concrete code fixes
To mitigate LLM data leakage when using Cockroachdb in Chi, focus on strict input validation, schema-aware authorization, and safe handling of LLM inputs and outputs. Below are concrete code examples tailored to Chi and Cockroachdb that align with middleBrick’s findings and recommended remediation guidance.
1. Parameterized queries and strict schema validation
Always use parameterized statements to prevent SQL injection and avoid embedding raw values in prompts. In Chi, leverage the database/sql interface with Cockroachdb’s native driver or an ORM that supports prepared statements.
// Chi route using parameterized query
app.get('/api/users/:id', async (req, res) => {
const { id } = req.params;
const query = 'SELECT display_name, preferences FROM user_prefs WHERE tenant_id = $1 AND user_id = $2';
const params = [req.tenant.id, id];
const result = await db.query(query, params);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'not_found' });
}
// Safe: only non-sensitive fields are forwarded to LLM
const safeData = result.rows.map(r => ({ display_name: r.display_name }));
res.json(safeData);
});
2. Prompt sanitization and output filtering
Never include raw database values in LLM prompts. If you must provide context, use abstracted, non-sensitive representations. Also, filter LLM outputs to remove potential PII or secrets before sending responses.
// Build safe prompts without Cockroachdb-sensitive fields
const buildPrompt = (userInput) => {
// Use abstracted context instead of raw data
return `Summarize preferences based on the provided abstract context: ${userInput}. Do not include identifiers.`;
};
// Filter LLM outputs for accidental leakage
const filterOutput = (text) => {
// Basic example: redact potential keys and emails
return text
.replace(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g, '[email]')
.replace(/\b[a-zA-Z0-9_-]{20,}\b/g, '[redacted]');
};
app.post('/api/summarize', async (req, res) => {
const prompt = buildPrompt(req.body.context);
const response = await openai.chat.completions.create({ model: 'gpt-3.5-turbo', messages: [{ role: 'user', content: prompt }] });
const safeText = filterOutput(response.choices[0]?.message?.content || '');
res.json({ summary: safeText });
});
3. Enforce Row-Level Security (RLS) and tenant checks
Cockroachdb supports row-level security policies. Define policies that restrict rows by tenant_id and ensure every query respects the current tenant context. In Chi, validate tenant membership before any database interaction and avoid dynamic tenant identifiers in prompts.
-- Cockroachdb RLS example
ALTER TABLE user_prefs ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON user_prefs
USING (tenant_id = current_setting('app.current_tenant', true)::uuid);
-- Chi middleware sets the tenant context
app.use((req, res, next) => {
const tenantId = req.tenant.id; // validated earlier
req.dbQuery = (text, params) => db.query(text, [...params, tenantId]);
next();
});
// Safe query execution within tenant context
app.get('/api/prefs', async (req, res) => {
const result = await req.dbQuery('SELECT * FROM user_prefs WHERE user_id = $1', [req.user.id]);
res.json(result.rows);
});
4. Limit LLM scope and disable dangerous tooling
Configure your Chi application to avoid exposing tool_calls or function_call patterns that directly execute Cockroachdb operations unless absolutely necessary. When LLM integration is required, use read-only database roles and disable automatic function execution.
// Example: invoke LLM without enabling autonomous tool calls
const chatParams = {
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: 'Explain the user preferences.' }],
tools: [], // explicitly disable tools
};
const response = await openai.chat.completions.create(chatParams);
// Further process response safely
5. Monitor and redact logs
Ensure that logs and error messages related to Cockroachdb do not include sensitive data or SQL fragments that could aid an attacker. Use structured logging with field-level redaction in Chi and validate that LLM-related outputs are audited before being stored or displayed.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |