Llm Data Leakage in Actix with Cockroachdb
Llm Data Leakage in Actix with Cockroachdb — how this specific combination creates or exposes the vulnerability
When an Actix web service uses CockroachDB as its primary data store and exposes endpoints that interact with language model (LLM) features, there is potential for LLM data leakage if runtime responses or database operations inadvertently surface sensitive information. middleBrick’s LLM/AI Security checks specifically test for system prompt leakage, injection attempts, and output exposure, which is relevant when Actix handlers pass database content into LLM prompts or stream database-driven context to models.
In this combination, risk arises when application code constructs prompts or messages using rows fetched from CockroachDB without sanitizing or controlling what data reaches the LLM. For example, if a handler queries tenant-specific records and embeds those records directly into a prompt, confidential or regulated data could appear in LLM outputs, which middleBrick’s output scanning would detect as PII or API key exposure. Another scenario occurs when error messages from CockroachDB are forwarded to an LLM endpoint; these messages may contain schema details or table names useful to an attacker and could trigger system prompt leakage patterns that middleBrick detects using its 27 regex patterns tailored for ChatML, Llama 2, Mistral, and Alpaca formats.
Additionally, if the Actix service calls an unauthenticated LLM endpoint and passes database-derived content, middleBrick’s unauthenticated LLM endpoint detection can flag the exposure path. Because CockroachDB often stores multi-tenant data, improper authorization checks (BOLA/IDOR) combined with LLM prompts can lead to cross-tenant data exposure through model outputs. middleBrick tests for BOLA/IDOR in parallel with LLM security probes to highlight scenarios where a user can manipulate identifiers to retrieve or influence LLM context that should be isolated per tenant.
An illustrative risk pattern in Actix is constructing a chat completion request by concatenating user input with a row containing sensitive metadata, then forwarding the combined string to an LLM without validating or redacting. middleBrick’s LLM output scanning would search for API keys, PII, or executable code in the model’s response, and its Excessive Agency detection would inspect whether tool_calls or function_call patterns in the application allow overly broad data access. Since this scanner runs black-box against the unauthenticated attack surface, it can identify these leakage channels without needing credentials, emphasizing the importance of reviewing how CockroachDB rows are incorporated into LLM workflows.
Compliance mapping is relevant here as well: findings related to LLM data leakage in Actix with CockroachDB can map to OWASP API Top 10, GDPR, and SOC2 controls around data exposure and confidentiality. By using middleBrick’s Pro plan continuous monitoring, teams can configure scans on a schedule to detect regressions in how database content reaches LLMs, and the GitHub Action can fail builds if risk scores degrade due to new prompt-injection or leakage paths introduced in code changes affecting CockroachDB queries or LLM integrations.
Cockroachdb-Specific Remediation in Actix — concrete code fixes
To reduce LLM data leakage risk when Actix interacts with CockroachDB, implement strict query scoping, output sanitization, and controlled prompt construction. Always enforce tenant isolation using tenant_id columns in WHERE clauses, and avoid selecting or logging raw rows that may contain sensitive fields before they are passed to any LLM-related code.
Below are concrete, working CockroachDB examples for Actix in Rust, using the cockroach-client crate. These snippets demonstrate parameterized queries, column selection, and safe data handling that align with remediation guidance provided by middleBrick findings.
1. Tenant-isolated read with explicit columns
Select only necessary, non-sensitive columns and bind the tenant identifier to prevent cross-tenant reads.
use cockroach_client::{Client, Transaction}; use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct PublicProfile {
user_id: i64,
display_name: String,
// Do NOT include email, ssn, or internal_role here
}
async fn get_public_profile(
client: &Client,
tenant_id: &str,
user_id: i64,
) -> Result> {
let row = client
.query_one(
"SELECT display_name FROM profiles WHERE tenant_id = $1 AND user_id = $2",
&[&tenant_id, &user_id],
)
.await?;
let profile = PublicProfile {
display_name: row.get(0),
user_id: row.get(1),
};
Ok(profile)
}
2. Redacting sensitive fields before LLM consumption
If you must query broader rows, redact or omit sensitive columns before constructing prompts.
async fn build_limited_context(
client: &Client,
tenant_id: &str,
order_id: i64,
) -> Result> {
let row = client
.query_one(
"SELECT order_id, item_count, total_cents FROM orders WHERE tenant_id = $1 AND order_id = $2",
&[&tenant_id, &order_id],
)
.await?;
// Explicitly avoid including internal_notes or raw_pii from the row
let context = format!("Order {} has {} items, total {}",
row.get::<_, i64>(0),
row.get::<_, i64>(1),
row.get::<_, i64>(2),
);
Ok(context)
}
3. Parameterized writes with no sensitive leakage in errors
Use placeholders and avoid string interpolation to prevent accidental exposure of values in logs or error messages that might be forwarded to an LLM endpoint.
async fn update_status_safe(
client: &Client,
tenant_id: &str,
profile_id: i64,
new_status: &str,
) -> Result<(), Box> {
client
.execute(
"UPDATE profiles SET status = $1 WHERE tenant_id = $2 AND profile_id = $3",
&[&new_status, &tenant_id, &profile_id],
)
.await?;
// Avoid logging raw rows or full query strings that may contain PII
Ok(())
}
4. Avoiding dynamic table or column names in LLM workflows
Do not construct SQL using user input for identifiers; if dynamic querying across tenant-specific tables is required, validate identifiers against a strict allowlist and never include raw table/column names in prompts sent to LLMs.
async fn safe_multi_tenant_summary(
client: &Client,
tenant_id: &str,
) -> Result> {
// Validate tenant_id against an allowlist in application code
let allowed = [&"t1", &"t2", &"t3"];
if !allowed.contains(&tenant_id) {
return Err("invalid tenant".into());
}
// Use static SQL shape; do not interpolate table names into LLM prompts
let rows = client
.query(
&format!("SELECT summary FROM summaries_{} WHERE created >= now() - INTERVAL '7 days'", tenant_id),
&[],
)
.await?;
// Aggregate and sanitize before any LLM call
let summaries: Vec = rows.iter().map(|r| r.get(0)).collect();
// Redact or generalize before sending to LLM
let context = format!("{} summaries", summaries.len());
Ok(context)
}
By combining these patterns with middleBrick’s findings—especially BOLA/IDOR, Property Authorization, and LLM Security checks—you can ensure that CockroachDB-driven data stays properly scoped, that sensitive fields are not exposed to LLM endpoints, and that unauthenticated LLM endpoints are not inadvertently reachable. These practices reduce the likelihood of LLM data leakage while keeping your API risk score within acceptable ranges as tracked in the middleBrick Web Dashboard and reinforced by the GitHub Action or MCP Server integrations.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |