HIGH excessive data exposurefastapi

Excessive Data Exposure in Fastapi

How Excessive Data Exposure Manifests in Fastapi

Excessive Data Exposure in Fastapi applications often occurs through model serialization and response handling. Fastapi's automatic Pydantic model serialization can inadvertently expose sensitive fields when developers use the same model for both input validation and response output.

A common pattern is using a single Pydantic model for database operations and API responses. Consider this problematic Fastapi endpoint:

from fastapi import FastAPI
from pydantic import BaseModel
from typing import Optional

app = FastAPI()

class User(BaseModel):
    id: int
    username: str
    email: str
    password_hash: str  # Sensitive field exposed!
    is_admin: bool

@app.get("/users/{user_id}", response_model=User)
async def get_user(user_id: int):
    user = await get_user_from_db(user_id)  # Returns User object
    return user  # password_hash and is_admin exposed in response

This pattern exposes password hashes and admin status to any authenticated user who can access user details. The issue compounds when Fastapi's dependency injection and background tasks automatically serialize complete objects.

Another Fastapi-specific manifestation occurs with response_model inheritance and nested models. When using response_model_include or response_model_exclude, developers might miss nested sensitive fields:

class Order(BaseModel):
    id: int
    user_id: int
    total: float
    payment_details: dict  # Contains full credit card info

@app.get("/orders/{order_id}", response_model=Order)
async def get_order(order_id: int):
    order = await get_order_from_db(order_id)
    return order  # Full payment details exposed

Fastapi's automatic JSON serialization of database models without proper field filtering is particularly dangerous when using ORMs like SQLAlchemy or Tortoise ORM, where model instances contain all database fields by default.

Fastapi-Specific Detection

Detecting Excessive Data Exposure in Fastapi requires examining both the OpenAPI schema and actual runtime responses. middleBrick's scanner specifically targets Fastapi applications by analyzing the generated OpenAPI spec and testing endpoints for sensitive data exposure.

The scanner examines Fastapi's response_model declarations to identify models that contain sensitive fields. For example, it flags patterns like:

# middleBrick detects this as risky:
class UserProfile(BaseModel):
    id: int
    username: str
    email: str
    ssn: str  # Social Security Number - excessive exposure
    date_of_birth: str
    address: str
    credit_score: int

middleBrick also tests Fastapi endpoints by making authenticated requests and analyzing responses for patterns like:

  • Password hashes, API keys, or authentication tokens in responses
  • Internal database IDs or system metadata
  • Business logic data not needed by API consumers
  • Geolocation data, SSNs, or other PII beyond what's necessary

For Fastapi applications using background tasks or dependency injection, middleBrick simulates requests through the full Fastapi request lifecycle to catch serialization issues that might only appear in production.

Developers can also use Fastapi's built-in OpenAPI documentation to manually inspect response models:

# Generate OpenAPI spec
curl http://localhost:8000/openapi.json

# Look for models with sensitive fields
# middleBrick automates this analysis across all endpoints

Fastapi-Specific Remediation

Fastapi provides several native mechanisms to prevent Excessive Data Exposure. The most effective approach is using separate Pydantic models for input and output:

from pydantic import BaseModel
from typing import Optional

# Input model - includes all fields for validation
class UserCreate(BaseModel):
    username: str
    email: str
    password: str
    is_admin: bool

# Output model - excludes sensitive fields
class UserRead(BaseModel):
    id: int
    username: str
    email: str
    is_admin: bool

@app.post("/users/", response_model=UserRead")
async def create_user(user: UserCreate):
    created_user = await create_user_in_db(user)
    return created_user  # Only safe fields returned

For more granular control, Fastapi supports response_model_exclude and response_model_include parameters:

from typing import Union

class SensitiveData(BaseModel):
    id: int
    secret_key: str
    internal_notes: str

@app.get("/data/{item_id}", response_model=SensitiveData,
         response_model_exclude={
             "secret_key": True,
             "internal_notes": True
         })
async def get_data(item_id: int):
    return await get_sensitive_data_from_db(item_id)

Fastapi's dependency injection system can also help by creating sanitized response objects:

from fastapi import Depends

async def get_sanitized_user(user_id: int = Depends(get_current_user_id)):
    user = await get_user_from_db(user_id)
    # Create sanitized version without sensitive fields
    return {
        "id": user.id,
        "username": user.username,
        "email": user.email
    }

@app.get("/users/me")
async def get_current_user(user: dict = Depends(get_sanitized_user)):
    return user

For complex nested models, Fastapi's schema_extra can define serialization rules:

class OrderResponse(BaseModel):
    id: int
    user: UserRead  # Nested sanitized model
    total: float
    items: list[OrderItem]
    
    class Config:
        schema_extra = {
            "example": {
                "id": 123,
                "user": {"username": "testuser", "email": "test@example.com"},
                "total": 99.99,
                "items": []
            }
        }

Related CWEs: propertyAuthorization

CWE IDNameSeverity
CWE-915Mass Assignment HIGH

Frequently Asked Questions

How does middleBrick detect Excessive Data Exposure in Fastapi applications?

middleBrick scans Fastapi applications by analyzing the generated OpenAPI specification and making authenticated requests to test endpoints. It identifies models with sensitive fields like password hashes, API keys, SSNs, and internal metadata that shouldn't be exposed. The scanner examines response_model declarations and tests actual responses to catch data exposure that might not be obvious from the code alone. middleBrick also checks for excessive data in nested models and array responses that could leak information across multiple records.

What's the difference between Fastapi's response_model_exclude and using separate Pydantic models?

Using separate Pydantic models (like UserCreate vs UserRead) is the most explicit and maintainable approach. It makes the API contract clear and prevents accidental exposure if you add new fields. response_model_exclude is more flexible for quick changes but can be error-prone since it relies on remembering to exclude fields. For production Fastapi applications, separate models are recommended for critical endpoints, while response_model_exclude works well for temporary adjustments or less sensitive data. middleBrick's scanner checks both patterns to ensure sensitive data isn't leaking through either approach.