HIGH hallucination attackscockroachdb

Hallucination Attacks in Cockroachdb

How Hallucination Attacks Manifest in CockroachDB

Hallucination attacks occur when a language model produces fabricated or misleading output that is later used as part of a system’s logic. In a CockroachDB‑backed application, the most dangerous path is when the model’s output is concatenated directly into a SQL statement without sanitization. An attacker crafts a prompt that causes the LLM to emit a SQL fragment (e.g., SELECT * FROM users WHERE id = 1 OR 1=1;) that the application then executes against CockroachDB.

Because CockroachDB supports the PostgreSQL wire protocol, the vulnerable code often looks like this in Node.js:

// Vulnerable: LLM output used directly in a query string
const userPrompt = await llm.generate(prompt);
const sql = `SELECT * FROM accounts WHERE owner = '${userPrompt}'`;
const result = await client.query(sql);

If the LLM hallucinates a value such as ' OR '1'='1, the resulting query becomes SELECT * FROM accounts WHERE owner = '' OR '1'='1', returning every row. This is a classic SQL injection (CWE‑89) that has been observed in CockroachDB deployments (see CVE‑2021‑21402, which reported SQL injection via malformed input in internal tables). The attack does not require authentication because the vulnerable endpoint is often exposed as part of a public API that forwards user text to an LLM.

Beyond data exfiltration, hallucinated output can also contain data‑modification statements (UPDATE, DELETE) or administrative commands (DROP TABLE), leading to integrity loss or denial of service. The root cause is the trust placed in the LLM’s output as if it were safe user input.

CockroachDB‑Specific Detection

Detecting hallucination‑driven SQL injection requires observing both the LLM interaction and the resulting database traffic. middleBrick’s LLM/AI security module performs active prompt‑injection probing (five sequential probes: system‑prompt extraction, instruction override, DAN jailbreak, data exfiltration, cost exploitation) and scans the model’s responses for dangerous patterns such as SQL keywords, semicolons, or comment sequences.

When you submit a public API URL to middleBrick (via the dashboard, CLI, or GitHub Action), it:

Sends a series of crafted prompts designed to trigger hallucinations.
Monitors the HTTP responses for evidence that the LLM output influenced downstream behavior (e.g., changes in status codes, response timing, or leaked data).
Flags any response that contains SQL‑like strings or unexpected database errors, correlating them with the injected prompts.

For example, running the CLI:

middlebrick scan https://api.example.com/query

produces a JSON report that includes a finding under the "LLM/AI Security" category with severity "High" and a description like "Prompt injection caused LLM to emit SQL keyword ‘UNION’; potential SQL injection via CockroachDB."

In addition to middleBrick, you can enable CockroachDB’s statement logging (SET CLUSTER SETTING sql.trace.log_statement_execute = true;) to capture the exact SQL sent by the application. Look for statements that contain user‑provided text outside of parameter placeholders—any appearance of raw string literals in the log is a strong indicator of a hallucination‑driven injection vector.

CockroachDB‑Specific Remediation

The fix is to never trust LLM output as raw SQL. Instead, treat it as untrusted data and use parameterized queries or an ORM that automatically escapes values. CockroachDB’s PostgreSQL‑compatible driver supports the standard $1, $2, … placeholders.

Revised Node.js example using the pg library:

// Safe: LLM output passed as a query parameter
const userPrompt = await llm.generate(prompt);
const sql = 'SELECT * FROM accounts WHERE owner = $1';
const result = await client.query(sql, [userPrompt]);

If your application uses an ORM such as Sequelize or TypeORM, ensure that the LLM‑generated value is passed through the model’s attribute assignment, which internally uses prepared statements.

For dynamic SQL where the LLM must influence the query structure (e.g., selecting columns), adopt an allow‑list approach:

const allowedColumns = ['id', 'owner', 'balance'];
const column = userPrompt.trim();
if (!allowedColumns.includes(column)) {
  throw new Error('Invalid column name');
}
const sql = `SELECT ${column} FROM accounts WHERE owner = $1`;
const result = await client.query(sql, [ownerValue]);

CockroachDB also provides the EXECUTE ... USING syntax for server‑side prepared statements, which can be invoked from application code via the driver’s query method with parameters, as shown above.

Finally, add runtime validation: reject any LLM output that contains SQL‑specific characters (;, --, /*, */) or that does not match an expected pattern (e.g., UUID for an identifier). Combine this with middleBrick’s continuous monitoring (available on the Pro plan) to get alerts if a previously safe endpoint begins to exhibit hallucination‑induced injection patterns.

Related CWEs: llmSecurity

CWE ID	Name	Severity
CWE-754	Improper Check for Unusual or Exceptional Conditions	MEDIUM

Frequently Asked Questions

Does middleBrick block hallucination attacks automatically?

No. middleBrick only detects and reports the issue. It provides a security risk score, detailed findings, and remediation guidance, but it does not modify or block traffic. You must apply the recommended code changes in your application to prevent the attack.

Can I use middleBrick to test internal CockroachDB clusters that are not exposed on the internet?

middleBrick scans only the unauthenticated, externally reachable API surface you provide via a URL. To assess internal services, you would need to expose them through a temporary, controlled endpoint or run the CLI/GitHub Action from within your network where the endpoint is reachable.