Hallucination Attacks in Cockroachdb
How Hallucination Attacks Manifest in CockroachDB
Hallucination attacks occur when a language model produces fabricated or misleading output that is later used as part of a system’s logic. In a CockroachDB‑backed application, the most dangerous path is when the model’s output is concatenated directly into a SQL statement without sanitization. An attacker crafts a prompt that causes the LLM to emit a SQL fragment (e.g., SELECT * FROM users WHERE id = 1 OR 1=1;) that the application then executes against CockroachDB.
Because CockroachDB supports the PostgreSQL wire protocol, the vulnerable code often looks like this in Node.js:
// Vulnerable: LLM output used directly in a query string
const userPrompt = await llm.generate(prompt);
const sql = `SELECT * FROM accounts WHERE owner = '${userPrompt}'`;
const result = await client.query(sql);
If the LLM hallucinates a value such as ' OR '1'='1, the resulting query becomes SELECT * FROM accounts WHERE owner = '' OR '1'='1', returning every row. This is a classic SQL injection (CWE‑89) that has been observed in CockroachDB deployments (see CVE‑2021‑21402, which reported SQL injection via malformed input in internal tables). The attack does not require authentication because the vulnerable endpoint is often exposed as part of a public API that forwards user text to an LLM.
Beyond data exfiltration, hallucinated output can also contain data‑modification statements (UPDATE, DELETE) or administrative commands (DROP TABLE), leading to integrity loss or denial of service. The root cause is the trust placed in the LLM’s output as if it were safe user input.
CockroachDB‑Specific Detection
Detecting hallucination‑driven SQL injection requires observing both the LLM interaction and the resulting database traffic. middleBrick’s LLM/AI security module performs active prompt‑injection probing (five sequential probes: system‑prompt extraction, instruction override, DAN jailbreak, data exfiltration, cost exploitation) and scans the model’s responses for dangerous patterns such as SQL keywords, semicolons, or comment sequences.
When you submit a public API URL to middleBrick (via the dashboard, CLI, or GitHub Action), it:
- Sends a series of crafted prompts designed to trigger hallucinations.
- Monitors the HTTP responses for evidence that the LLM output influenced downstream behavior (e.g., changes in status codes, response timing, or leaked data).
- Flags any response that contains SQL‑like strings or unexpected database errors, correlating them with the injected prompts.
For example, running the CLI:
middlebrick scan https://api.example.com/query
produces a JSON report that includes a finding under the "LLM/AI Security" category with severity "High" and a description like "Prompt injection caused LLM to emit SQL keyword ‘UNION’; potential SQL injection via CockroachDB."
In addition to middleBrick, you can enable CockroachDB’s statement logging (SET CLUSTER SETTING sql.trace.log_statement_execute = true;) to capture the exact SQL sent by the application. Look for statements that contain user‑provided text outside of parameter placeholders—any appearance of raw string literals in the log is a strong indicator of a hallucination‑driven injection vector.
CockroachDB‑Specific Remediation
The fix is to never trust LLM output as raw SQL. Instead, treat it as untrusted data and use parameterized queries or an ORM that automatically escapes values. CockroachDB’s PostgreSQL‑compatible driver supports the standard $1, $2, … placeholders.
Revised Node.js example using the pg library:
// Safe: LLM output passed as a query parameter
const userPrompt = await llm.generate(prompt);
const sql = 'SELECT * FROM accounts WHERE owner = $1';
const result = await client.query(sql, [userPrompt]);
If your application uses an ORM such as Sequelize or TypeORM, ensure that the LLM‑generated value is passed through the model’s attribute assignment, which internally uses prepared statements.
For dynamic SQL where the LLM must influence the query structure (e.g., selecting columns), adopt an allow‑list approach:
const allowedColumns = ['id', 'owner', 'balance'];
const column = userPrompt.trim();
if (!allowedColumns.includes(column)) {
throw new Error('Invalid column name');
}
const sql = `SELECT ${column} FROM accounts WHERE owner = $1`;
const result = await client.query(sql, [ownerValue]);
CockroachDB also provides the EXECUTE ... USING syntax for server‑side prepared statements, which can be invoked from application code via the driver’s query method with parameters, as shown above.
Finally, add runtime validation: reject any LLM output that contains SQL‑specific characters (;, --, /*, */) or that does not match an expected pattern (e.g., UUID for an identifier). Combine this with middleBrick’s continuous monitoring (available on the Pro plan) to get alerts if a previously safe endpoint begins to exhibit hallucination‑induced injection patterns.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |