Hallucination Attacks in Fastapi with Api Keys
Hallucination Attacks in Fastapi with Api Keys — how this specific combination creates or exposes the vulnerability
A Hallucination Attack in a Fastapi service that uses API keys occurs when an attacker manipulates inputs or request patterns to produce unreliable, fabricated, or overconfident responses from the application or an integrated LLM. In a Fastapi backend, API keys are typically passed as HTTP headers (e.g., X-API-Key) and used for authentication, rate limiting, or usage tracking. If the API key is accepted but the endpoint also consumes untrusted user input to drive LLM calls or dynamic behavior, the combination can expose logic that is sensitive to prompt injection, data exfiltration, or model misuse.
Consider a Fastapi route that accepts an API key for authorization and forwards user-supplied text to an LLM:
from fastapi import Fastapi, Header, HTTPException
import httpx
app = Fastapi()
@app.post("/query")
async def query_endpoint(prompt: str, x_api_key: str = Header(...)):
# Example: forward to an LLM endpoint
async with httpx.AsyncClient() as client:
resp = await client.post(
"https://llm.example.com/v1/completions",
headers={"Authorization": f"Bearer {x_api_key}"},
json={"prompt": prompt, "max_tokens": 128}
)
return {"response": resp.text}
If input validation is weak, an attacker can supply crafted prompts that cause the LLM to reveal its system instructions, training data characteristics, or internal tool usage (e.g., function calls). This is a hallucination attack vector: the API key ensures the request is authorized, but the payload exploits the model’s behavior rather than the key itself. Attackers may also probe rate limits tied to the key, observe inconsistent responses across similar inputs, or attempt to infer metadata about the model through differential outputs. Because Fastapi often deserializes JSON bodies and headers directly into parameters, missing schema constraints or relaxed types increase the risk of unexpected or malicious payloads reaching the LLM integration.
Another scenario involves overconfident or fabricated responses when the Fastapi service synthesizes answers from multiple data sources. If the service does not properly validate or cite sources, an API-key-authenticated client might receive convincingly wrong information that appears authoritative. This can be especially damaging when the API key is tied to privileged scopes, and the hallucinated content influences downstream decisions. The presence of an API key does not mitigate risks related to input sanitization, output validation, or model behavior controls; it only identifies the caller.
middleBrick identifies such risks by correlating authentication mechanisms with input validation and LLM security checks. In a scan that includes OpenAPI/Swagger analysis with full $ref resolution alongside runtime testing, findings may highlight missing schema enforcement, lack of output filtering, or unauthenticated LLM endpoints that could be abused alongside API key usage. These insights help teams understand how authentication boundaries interact with model-driven functionality.
Api Keys-Specific Remediation in Fastapi — concrete code fixes
Remediation focuses on tightening input validation, constraining LLM prompts, and ensuring API keys are handled securely without overtrusting authorized callers. Below are concrete code examples for a Fastapi service that uses API keys and integrates with an LLM.
1. Strict header validation and key format checks
Validate the API key format before using it. Use Pydantic settings or constants to avoid accepting malformed keys that could bypass checks or be used in injection attempts.
from fastapi import Fastapi, Header, HTTPException
import re
import httpx
app = Fastapi()
API_KEY_PATTERN = re.compile(r"^[A-Z0-9]{32}$")
def validate_api_key(key: str) -> bool:
return bool(API_KEY_PATTERN.match(key))
@app.post("/query")
async def query_endpoint(prompt: str, x_api_key: str = Header(...)):
if not validate_api_key(x_api_key):
raise HTTPException(status_code=400, detail="Invalid API key format")
# Continue with safe processing
2. Constrain and sanitize LLM prompts
Apply allowlists, length limits, and content filtering to user prompts. Avoid directly concatenating user input into system or assistant messages that shape model behavior.
from fastapi import Fastapi, Header, HTTPException
import httpx
import html
app = Fastapi()
MAX_PROMPT_LENGTH = 500
@app.post("/query")
async def query_endpoint(prompt: str, x_api_key: str = Header(...)):
if not validate_api_key(x_api_key):
raise HTTPException(status_code=400, detail="Invalid API key format")
if len(prompt) > MAX_PROMPT_LENGTH:
raise HTTPException(status_code=400, detail="Prompt too long")
safe_prompt = html.escape(prompt.strip())
async with httpx.AsyncClient() as client:
resp = await client.post(
"https://llm.example.com/v1/completions",
headers={"Authorization": f"Bearer {x_api_key}"},
json={"prompt": safe_prompt, "max_tokens": 128}
)
return {"response": resp.text}
3. Enforce output validation and response handling
Inspect LLM outputs for PII, code blocks, or unexpected formats before returning them to the client. Use regex or libraries designed for PII detection rather than trusting model assurances.
import re
from fastapi import Fastapi, Header, HTTPException
import httpx
app = Fastapi()
PII_REGEX = re.compile(r"\b\d{3}-\d{2}-\d{4}\b") # Example SSN pattern
@app.post("/query")
async def query_endpoint(prompt: str, x_api_key: str = Header(...)):
if not validate_api_key(x_api_key):
raise HTTPException(status_code=400, detail="Invalid API key format")
safe_prompt = html.escape(prompt.strip())
async with httpx.AsyncClient() as client:
resp = await client.post(
"https://llm.example.com/v1/completions",
headers={"Authorization": f"Bearer {x_api_key}"},
json={"prompt": safe_prompt, "max_tokens": 128}
)
output = resp.text
if PII_REGEX.search(output):
raise HTTPException(status_code=400, detail="Response contains potential PII")
return {"response": output}
4. Rate limiting tied to identity, not just keys
Use a robust rate limiter that considers the API key identity and request patterns to mitigate abuse that could amplify hallucination impact. Fastapi applications can integrate with Redis or in-memory strategies for this purpose.
from fastapi import Fastapi, Header, HTTPException, Request
from slowapi import Limiter
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
app = Fastapi()
limiter = Limiter(key_func=lambda r: r.headers.get("X-API-Key", "anonymous"))
app.state.limiter = limiter
@app.post("/query")
@app.state.limiter.limit("10/minute")
async def query_endpoint(prompt: str, x_api_key: str = Header(...)):
if not validate_api_key(x_api_key):
raise HTTPException(status_code=400, detail="Invalid API key format")
safe_prompt = html.escape(prompt.strip())
async with httpx.AsyncClient() as client:
resp = await client.post(
"https://llm.example.com/v1/completions",
headers={"Authorization": f"Bearer {x_api_key}"},
json={"prompt": safe_prompt, "max_tokens": 128}
)
return {"response": resp.text}
These measures reduce the surface for hallucination attacks by ensuring that authorized requests still undergo strict validation, safe prompt handling, and output scrutiny. They complement broader scanning practices that map findings to frameworks like OWASP API Top 10 and can be integrated into CI/CD via tools such as the middleBrick GitHub Action to fail builds when risk thresholds are exceeded.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |