Excessive Data Exposure in Fastapi with Cockroachdb
Excessive Data Exposure in Fastapi with Cockroachdb
Excessive Data Exposure occurs when an API returns more data than necessary for a given operation, often including sensitive fields that should remain restricted. In a Fastapi application backed by Cockroachdb, this risk arises from a mismatch between what the database stores and what the endpoint serializes and returns. Cockroachdb, a distributed SQL database, can store rich schema information including columns that contain personally identifiable information (PII), internal identifiers, or audit metadata. If the API layer does not explicitly select only required fields before serialization, the full database row can be exposed to the client.
For example, consider a GET /users/{user_id} endpoint that queries a Cockroachdb table containing columns such as email, phone_number, and password_hash. A naive implementation might map query results directly to a Pydantic model that includes all fields, unintentionally leaking sensitive data. Even when using ORM tools that map rows to models, failing to define a strict subset of fields for public responses results in over-disclosure. The vulnerability is compounded when the API relies on Cockroachdb’s secondary indexes or joins that inadvertently surface fields not intended for the client.
Another common pattern is the use of dynamic queries that return variable columns based on request parameters. If input validation is weak, an attacker might manipulate query parameters to request broader result sets or invoke stored procedures that expose additional tables. Since Cockroachdb supports complex queries and distributed transactions, developers might inadvertently expose internal columns like created_at, updated_at, or internal status flags that should be hidden. These data exposure risks map directly to the OWASP API Security Top 10 category A05:2023, which classifies Broken Function Level Authorization and related data exposure issues as critical concerns.
Instrumentation and logging practices can also contribute to Excessive Data Exposure. If Fastapi middleware logs full request and response payloads without sanitization, sensitive Cockroachdb fields might be written to log stores or monitoring systems. Attackers who gain access to logs or error messages can harvest API keys, session tokens, or other sensitive information. The combination of a permissive ORM configuration, insufficient field-level filtering, and verbose logging creates a chain that can lead to significant data leakage.
To mitigate these risks, developers should adopt a strict field-level serialization strategy. Define dedicated response models in Fastapi that include only the fields intended for the client, and map database rows explicitly into these models. Avoid returning raw SQLAlchemy or Tortoise ORM objects directly from endpoints. Enforce input validation to ensure that query parameters cannot alter the shape of the returned data. When integrating with Cockroachdb, use explicit column selection in queries rather than selecting all columns, and review distributed query plans to ensure unintended fields are not surfaced.
Cockroachdb-Specific Remediation in Fastapi
Remediation focuses on precise control over data flow between Cockroachdb and the Fastapi response layer. Begin by defining strict Pydantic models for API responses that exclude sensitive fields such as password_hash, internal_notes, or two_factor_secret. Use SQLAlchemy or Tortoise ORM queries to select only required columns, and map the results into these filtered models before returning them from the endpoint.
Below is a concrete example using SQLAlchemy with Cockroachdb. The database model includes sensitive columns, but the API returns a limited set via a dedicated response model:
from sqlalchemy import Column, Integer, String, Boolean from sqlalchemy.ext.declarative import declarative_base from pydantic import BaseModel from fastapi import Fastapi Base = declarative_base() class UserDB(Base): __tablename__ = 'users' id = Column(Integer, primary_key=True) email = Column(String) password_hash = Column(String) phone_number = Column(String) is_admin = Column(Boolean) class UserResponse(BaseModel): id: int email: str app = Fastapi() @app.get("/users/{user_id}", response_model=UserResponse) async def get_user(user_id: int): async with engine.begin() as conn: result = await conn.execute( select(UserDB.id, UserDB.email).where(UserDB.id == user_id) ) row = result.fetchone() if row: return UserResponse(id=row.id, email=row.email) return {"error": "not found"}In this example, the database model
UserDBcontains sensitive fields such aspassword_hashandphone_number, but the SQLAlchemyselectstatement explicitly limits the columns toidandUserResponsefurther ensures that only safe fields are serialized. This approach prevents accidental exposure of sensitive columns even if the ORM mapping or table schema changes in Cockroachdb.For applications using raw SQL or complex joins, apply the same principle: construct queries that return only necessary columns and map them to restricted response structures. Avoid using
SELECT *or automatic serialization that includes all table fields. Regularly audit your endpoint definitions and database schemas to confirm that no new sensitive columns have been inadvertently exposed. These practices align with the findings that middleBrick reports under its Data Exposure checks, providing clear remediation guidance to reduce risk.
Related CWEs: propertyAuthorization
| CWE ID | Name | Severity |
|---|---|---|
| CWE-915 | Mass Assignment | HIGH |
Frequently Asked Questions
How does middleBrick detect Excessive Data Exposure in Fastapi APIs backed by Cockroachdb?
Can the middleBrick CLI help enforce field-level filtering for Cockroachdb-backed Fastapi services?
middlebrick scan <url> to receive structured reports that flag excessive data exposure. The findings include severity ratings and remediation guidance, helping you refine serialization logic and column selection for services that rely on Cockroachdb.