HIGH hallucination attacksapi keys

Hallucination Attacks with Api Keys

How Hallucination Attacks Manifests in Api Keys

Hallucination attacks in API keys occur when an attacker manipulates an LLM's output to generate or expose sensitive API keys that weren't intended to be revealed. This manifests through several specific attack vectors:

System Prompt Leakage - When an LLM's system prompt contains hardcoded API keys or references to key management systems, attackers can extract these credentials through carefully crafted prompts. For example:

# Vulnerable system prompt might contain:
SYSTEM_PROMPT = "
You are a helpful assistant. Your API keys are stored in env variables: API_KEY=sk-1234, SECRET_KEY=sh-5678
"

Prompt Injection for Key Generation - Attackers can trick the model into generating valid API key patterns by asking it to "create example credentials" or "generate test keys." The model might output something like:

API Key: sk-1234-5678-9012-3456
Secret: sh-7890-1234-5678-9012

Context Window Poisoning - If API keys appear in the conversation history or are mentioned in documentation the model was trained on, the LLM might "hallucinate" these keys in responses, believing they're part of the expected output.

Function Call Exploitation - When an LLM uses tools that handle API keys, a hallucination attack might cause it to construct malicious API calls with stolen or fabricated keys, potentially triggering rate limits or exposing service vulnerabilities.

The most dangerous aspect is that these attacks often produce syntactically valid keys that appear legitimate but lead to unauthorized access or abuse of services.

Api Keys-Specific Detection

Detecting hallucination attacks in API keys requires a multi-layered approach. Here's how to identify and scan for these vulnerabilities:

Pattern-Based Scanning - Use regex patterns to detect potential API key exposure in LLM outputs:

import re

def detect_api_keys(text):
    patterns = [
        r'sk-[a-zA-Z0-9]{20,}',  # Stripe keys
        r'pk-[a-zA-Z0-9]{20,}',   # Public keys
        r'AIza[0-9A-Za-z_-]{35}', # Google AI keys
        r'ghp_[a-zA-Z0-9]{36}',   # GitHub tokens
        r'sk-[0-9a-f]{32}'        # Generic secret keys
    ]
    
    found = []
    for pattern in patterns:
        matches = re.findall(pattern, text)
        found.extend(matches)
    
    return found

Runtime Scanning with middleBrick - middleBrick's LLM/AI Security module specifically detects hallucination-related vulnerabilities:

# Scan an LLM endpoint for API key exposure
middlebrick scan https://api.example.com/chat

# Results include:
# - System prompt leakage detection
# - Active prompt injection testing
# - Output scanning for PII and API keys
# - Excessive agency detection

Input Validation Testing - Test how your LLM handles edge cases:

def test_prompt_injection(model, prompts):
    for prompt in prompts:
        response = model(prompt)
        
        # Check for key patterns
        keys = detect_api_keys(response)
        if keys:
            print(f"Potential key exposure: {keys}")
            
        # Check for excessive agency
        if 'function_call' in response or 'tool_calls' in response:
            print("Excessive agency detected")

Log Analysis - Monitor LLM outputs for repeated key patterns or suspicious generation requests that might indicate active exploitation attempts.

Api Keys-Specific Remediation

Remediating hallucination attacks in API keys requires both architectural changes and runtime protections:

System Prompt Hardening - Never include actual API keys or sensitive configuration in system prompts:

# Vulnerable
SYSTEM_PROMPT = "
You are a helpful assistant. Your API keys are stored in env variables: API_KEY=sk-1234, SECRET_KEY=sh-5678
"

# Secure
SYSTEM_PROMPT = "
You are a helpful assistant. Handle API keys securely by using environment variables and never expose credentials.
"

Input Sanitization - Filter out key-generation requests before they reach the LLM:

from fastapi import FastAPI, Request
app = FastAPI()

async def sanitize_input(request: Request):
    data = await request.json()
    
    # Block common key generation phrases
    blocked_phrases = [
        "generate api key",
        "create credentials",
        "example key",
        "test token"
    ]
    
    for phrase in blocked_phrases:
        if phrase in data.get('prompt', '').lower():
            return True  # Block request
    
    return False

Output Filtering - Scan LLM responses before returning them to users:

def filter_response(response):
    # Remove any detected keys
    patterns = [
        r'sk-[a-zA-Z0-9]{20,}',
        r'AIza[0-9A-Za-z_-]{35}',
        r'ghp_[a-zA-Z0-9]{36}'
    ]
    
    for pattern in patterns:
        response = re.sub(pattern, "[REDACTED]", response)
    
    return response

Rate Limiting and Monitoring - Implement strict rate limits on LLM endpoints and monitor for suspicious patterns:

from slowapi import Limiter, _rate_limit_exceeded
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

limiter = Limiter(key_func=get_remote_address)
limiter.limit("100/minute")(app)

@app.exception_handler(RateLimitExceeded)
async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
    return JSONResponse(
        status_code=429,
        content={"detail": "Rate limit exceeded"}
    )

Key Rotation and Scope Limitation - Use short-lived API keys with minimal permissions:

# Generate time-limited keys
import time
from cryptography.fernet import Fernet

def generate_temporary_key():
    key = Fernet.generate_key()
    expiry = int(time.time()) + 3600  # 1 hour
    return f"{key.decode()}:{expiry}"

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How can I tell if my LLM endpoint has already been compromised by hallucination attacks?
Scan your endpoint with middleBrick's LLM/AI Security module. It tests for system prompt leakage, prompt injection, and outputs containing API keys or PII. Look for repeated key patterns in your logs or unexpected API usage that might indicate key exposure.
Are hallucination attacks only a concern for public LLM endpoints?
No. Even internal LLM endpoints can be vulnerable if they process external inputs or have access to API keys. Any system that generates or processes API keys through an LLM is at risk, regardless of whether it's publicly accessible.