Llm Data Leakage in Axum with Cockroachdb
Llm Data Leakage in Axum with Cockroachdb — how this specific combination creates or exposes the vulnerability
When an Axum service that uses Cockroachdb exposes an LLM endpoint or integrates LLM-generated responses without proper safeguards, the combination can lead to LLM data leakage. This occurs when prompts, database queries, or result sets containing sensitive information are forwarded to an LLM without redaction or access controls. Axum, a Rust web framework, may construct runtime request data that includes database identifiers, PII from Cockroachdb rows, or session tokens. If these are included in system prompts or user messages sent to an LLM, the LLM or its logs may unintentionally reveal that data through its outputs.
LLM data leakage in this stack is not about the database itself being compromised, but about unintended exposure via LLM interactions. For example, a developer might log LLM requests for debugging and include the full user prompt that contains a Cockroachdb primary key or a JSON blob with personal data. middleBrick’s LLM/AI Security checks detect this by scanning for system prompt leakage patterns and by running active prompt injection tests that attempt to coax the LLM into revealing training data or internal instructions. The scanner specifically looks for PII, API keys, and executable code in LLM responses, which can surface when Axum applications embed raw database content into prompts.
Another vector arises from improper handling of tool calls or function calling features in LLM integrations. If Axum builds tool call arguments from Cockroachdb query results without constraining or sanitizing fields, the LLM may receive overly permissive agency (e.g., tool_calls or function_call patterns that enable data exfiltration). middleBrick’s LLM/AI Security module includes Excessive Agency detection to identify these patterns and Unauthenticated LLM Endpoint detection to flag publicly reachable LLM routes that could be abused to probe the integration. Because LLM data leakage often maps to OWASP API Top 10 and SOC2 controls, continuous scanning is valuable to ensure that runtime prompts do not leak sensitive schema or record-level data.
Cockroachdb-Specific Remediation in Axum — concrete code fixes
To prevent LLM data leakage when using Cockroachdb with Axum, sanitize and limit data that flows into LLM prompts. Avoid passing entire database rows or raw query results directly into system or user messages. Instead, project only necessary, non-sensitive fields and apply strict allowlists. The following Axum handler demonstrates a safe pattern where only a user ID and a non-sensitive display name are used, while sensitive columns are excluded before any LLM interaction.
use axum::{routing::get, Router};
use cockroachdb_rs::Connection;
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize)]
struct PublicProfile {
user_id: i64,
display_name: String,
}
async fn handler(
Path(user_id): Path,
db: Extension,
) -> Result, (StatusCode, String)> {
let row = db
.query_one(
"SELECT display_name FROM profiles WHERE id = $1",
&[&user_id],
)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
let profile = PublicProfile {
user_id,
display_name: row.get(0),
};
// Only pass non-sensitive fields to LLM
let _llm_input = format!("User {} needs assistance", profile.display_name);
Ok(Json(profile))
}
#[tokio::main]
async fn main() {
let conn = Connection::connect("postgresql://user:pass@localhost:26257/db?sslmode=disable")
.await
.expect("connect failed");
let app = Router::new()
.route("/profile/:user_id", get(handler))
.layer(Extension(conn));
// run server omitted
}
Additionally, enforce field-level permissions and schema masking in Cockroachdb queries so that columns containing PII or secrets are never selected in endpoints that may feed LLMs. Use parameterized queries to prevent injection and avoid dynamic SQL construction that could inadvertently expose raw data. middleBrick’s OpenAPI/Swagger spec analysis can help verify that response schemas do not include sensitive properties by cross-referencing spec definitions with runtime findings.
For applications with LLM tool calls, constrain the tool call arguments to a safe subset and validate against a strict schema before sending them to the LLM. This reduces the risk of excessive agency and ensures that Cockroachdb identifiers or sensitive values are not forwarded. Combine these practices with middleBrick’s CI/CD integrations (GitHub Action) to fail builds if a scan detects LLM data leakage patterns, and use the Web Dashboard to track security scores over time.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |