Hallucination Attacks in Fastapi
How Hallucination Attacks Manifests in Fastapi
Hallucination attacks in FastAPI applications occur when AI/ML models integrated into API endpoints generate fabricated or misleading outputs. These attacks exploit the inherent unpredictability of large language models (LLMs) and can manifest through several FastAPI-specific patterns.
The most common manifestation involves FastAPI endpoints that directly return LLM responses without validation. Consider this vulnerable pattern:
from fastapi import FastAPI
from langchain.chat_models import ChatOpenAI
import os
app = FastAPI()
@app.post("/generate")
async def generate_response(prompt: str):
model = ChatOpenAI(temperature=0.8, openai_api_key=os.getenv("OPENAI_API_KEY"))
response = model.predict(messages=[{
"role": "user",
"content": prompt
}])
return {"response": response}This endpoint allows attackers to craft prompts that cause the model to hallucinate sensitive information. For example, an attacker might use prompt injection to extract training data or generate false security advisories that could mislead users.
Another FastAPI-specific vulnerability arises from improper Pydantic model validation of AI-generated content. When AI responses are deserialized into Pydantic models without sanitization:
from pydantic import BaseModel
class UserProfile(BaseModel):
username: str
email: str
bio: str
@app.post("/process-ai-response")
async def process_ai_response(response: str):
# AI might generate malicious content
profile = UserProfile.parse_raw(response)
return profileThe AI could generate a response that includes unexpected fields or malformed data structures, potentially causing validation bypasses or information disclosure.
Hallucination attacks also manifest through FastAPI's dependency injection system. When AI-generated content is used to dynamically construct dependency parameters:
from fastapi import Depends
async def get_user_info(user_id: str = Depends(get_user_id_from_ai)):
return await get_user(user_id)
async def get_user_id_from_ai():
# AI might return unexpected values
return "malicious-user-id"Here, the AI could generate user IDs that bypass authorization checks or access unauthorized resources.
Fastapi-Specific Detection
Detecting hallucination attacks in FastAPI requires both runtime monitoring and specialized scanning. The most effective approach combines application-level logging with automated security scanning.
For runtime detection, implement structured logging of all AI model interactions:
from fastapi import FastAPI, HTTPException
from typing import Dict, Any
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class AIInteractionLogger:
def log_request(self, endpoint: str, prompt: str):
logger.info(f"AI_REQUEST: {endpoint} - Prompt length: {len(prompt)}")
def log_response(self, endpoint: str, response: str, confidence: float = None):
logger.info(f"AI_RESPONSE: {endpoint} - Response length: {len(response)}")
if confidence:
logger.info(f"Confidence score: {confidence}")Integrate this logging with your FastAPI endpoints to track suspicious patterns like unusually long responses or repeated requests with similar prompts.
For automated detection, middleBrick's LLM/AI Security scanner specifically identifies hallucination vulnerabilities in FastAPI applications. The scanner tests for:
- System prompt leakage through 27 regex patterns that detect common AI model formats
- Active prompt injection attempts that try to extract training data or generate false information
- Output validation failures where AI responses contain executable code or PII
- Excessive agency detection where AI models attempt to call external APIs or execute system commands
The scanning process requires no credentials or configuration—simply provide your FastAPI endpoint URL:
middlebrick scan https://api.example.com/generate
The scanner tests unauthenticated attack surfaces, identifying endpoints vulnerable to hallucination attacks within 5-15 seconds. It evaluates the complete attack surface including authentication bypasses, input validation weaknesses, and data exposure risks specific to AI-integrated FastAPI applications.
For continuous monitoring, the middleBrick GitHub Action can be integrated into your FastAPI CI/CD pipeline to automatically scan new endpoints before deployment:
- name: Scan API Security
uses: middleBrick/middlebrick-action@v1
with:
api_url: http://localhost:8000
fail_below_score: 80
Fastapi-Specific Remediation
Remediating hallucination attacks in FastAPI applications requires a multi-layered approach combining input validation, output sanitization, and architectural controls.
First, implement strict input validation using Pydantic models with custom validators:
from pydantic import BaseModel, validator
import re
class SafePrompt(BaseModel):
prompt: str
@validator('prompt')
def prevent_prompt_injection(cls, v):
# Block common injection patterns
if re.search(r'(system|role|content)', v, re.IGNORECASE):
raise ValueError("Potential prompt injection detected")
if len(v) > 1000: # Limit prompt size
raise ValueError("Prompt too long")
return v
@app.post("/generate-safe")
async def generate_safe(prompt_data: SafePrompt):
model = ChatOpenAI(temperature=0.2)
response = model.predict(messages=[{
"role": "user",
"content": prompt_data.prompt
}])
return {"response": response}This validation layer prevents many common prompt injection techniques by blocking suspicious patterns and limiting input size.
Second, implement output sanitization and validation:
import html
from typing import Dict, Any
def sanitize_ai_output(response: str) -> str:
# Remove HTML/script tags
sanitized = re.sub(r'<.*?>', '', response)
# Encode special characters
sanitized = html.escape(sanitized)
# Check for suspicious patterns
if re.search(r'(password|secret|api_key|token)', sanitized, re.IGNORECASE):
raise ValueError("Potential sensitive information in response")
return sanitized
@app.post("/generate-sanitized")
async def generate_sanitized(prompt: str):
model = ChatOpenAI(temperature=0.2)
raw_response = model.predict(messages=[{
"role": "user",
"content": prompt
}])
sanitized_response = sanitize_ai_output(raw_response)
return {"response": sanitized_response}This approach ensures that AI-generated content cannot contain executable code or sensitive information before being returned to users.
Third, implement confidence scoring and response filtering:
from fastapi import HTTPException
async def get_confidence_scored_response(prompt: str) -> Dict[str, Any]:
model = ChatOpenAI(temperature=0.2)
try:
response = model.predict(messages=[{
"role": "user",
"content": prompt
}])
# Simple confidence scoring based on response characteristics
confidence = 1.0
if len(response) > 500: # Long responses may be less reliable
confidence *= 0.8
if re.search(r'(maybe|could|might|perhaps)', response, re.IGNORECASE):
confidence *= 0.7
return {
"response": response,
"confidence": confidence
}
except Exception as e:
raise HTTPException(status_code=500, detail="AI processing failed")
@app.post("/generate-with-confidence")
async def generate_with_confidence(prompt: str):
result = await get_confidence_scored_response(prompt)
if result["confidence"] < 0.6:
raise HTTPException(
status_code=400,
detail="Low confidence in AI response"
)
return resultThis pattern allows you to reject responses that appear uncertain or potentially hallucinated.
Finally, implement architectural controls using FastAPI's dependency injection for security policies:
from fastapi import Depends, HTTPException
from functools import wraps
def ai_security_dependency():
def decorator(endpoint_func):
@wraps(endpoint_func)
async def secure_wrapper(*args, **kwargs):
# Check for suspicious patterns in request context
request = kwargs.get('request')
if request and has_suspicious_patterns(request):
raise HTTPException(
status_code=403,
detail="Suspicious AI interaction detected"
)
return await endpoint_func(*args, **kwargs)
return secure_wrapper
return decorator
@app.post("/generate-secured")
@ai_security_dependency()
async def generate_secured(prompt: str):
model = ChatOpenAI(temperature=0.2)
response = model.predict(messages=[{
"role": "user",
"content": prompt
}])
return {"response": response}This wrapper intercepts requests before they reach the AI model, blocking potentially malicious interactions.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |