HIGH pii leakagecockroachdb

Pii Leakage in Cockroachdb

How PII Leakage Manifests in CockroachDB

In a CockroachDB‑backed API, personally identifiable information (PII) such as email addresses, phone numbers, or government IDs is often leaked when the database layer returns more columns than the API consumer needs. Because CockroachDB speaks the PostgreSQL wire protocol, many developers use generic SELECT * statements or ORM methods that load entire rows. An attacker who can reach an unauthenticated endpoint (or a low‑privilege user) can simply request the resource and receive the full row, exposing columns that were never meant to be visible.

Typical vulnerable patterns include:

  • A Go handler that uses the pgx driver and runs db.Query(ctx, "SELECT * FROM users"), then marshals the result directly into JSON.
  • A Node.js/Express route that uses the pg package with client.query('SELECT * FROM customers') and sends result.rows to the client.
  • An ORM such as Gorm or Hibernate configured to fetch all fields by default, e.g., DB.Find(&users) where the struct maps to every column in the table.
  • Using CockroachDB’s changefeeds or CDC pipelines that stream the full row to downstream services without field‑level filtering.

These patterns map to OWASP API Security Top 10 2023 API4:2023 "Excessive Data Exposure". Real‑world incidents, such as the accidental exposure of customer emails in a SaaS platform (CVE‑2021‑3156‑like scenario), often trace back to a missing column whitelist in the data access layer.

CockroachDB‑Specific Detection

middleBrick performs unauthenticated, black‑box scanning of the API surface. When it encounters an endpoint that returns JSON, it inspects the payload for data elements that match common PII patterns (email, phone, SSN, etc.) and compares them against the endpoint’s documented contract (if an OpenAPI spec is supplied). If the response contains fields that are not declared in the spec—or if the spec itself allows additionalProperties: true without restriction—middleBrick flags a "Data Exposure" finding with severity High.

For example, scanning a CockroachDB‑powered endpoint with the CLI:

middlebrick scan 'https://api.example.com/v1/users'

might produce a JSON excerpt like:

{
  "findings": [
    {
      "id": "API4-EXPOSURE-001",
      "name": "Excessive Data Exposure",
      "severity": "high",
      "description": "Response includes columns 'email', 'phone_number', and 'ssn' that are not documented in the OpenAPI spec.",
      "remediation": "Limit the SELECT list to only required columns or create a view that excludes PII."
    }
  ]
}

Because middleBrick does not need agents, credentials, or source code, it can detect this issue in staging or production environments simply by providing the public URL. The scanner’s 12 parallel checks include the "Data Exposure" module, which actively probes for over‑fetching and cross‑references any supplied OpenAPI/Swagger spec (versions 2.0, 3.0, 3.1) with the actual runtime response.

CockroachDB‑Specific Remediation

Fixing PII leakage in a CockroachDB‑backed service involves ensuring that the database layer returns only the data the API is authorized to expose. The following CockroachDB‑native techniques are effective:

  • Explicit column selection – Replace SELECT * with a list of needed columns. In Go:
rows, err := db.QueryContext(ctx, "SELECT id, username, email FROM users WHERE id = $1", userID)
  • Using the EXCLUDE clause (CockroachDB v22.1+ mirrors PostgreSQL 14) to omit sensitive columns while still using SELECT *:
SELECT * EXCLUDE (ssn, phone_number) FROM customers WHERE region = 'us-east';
  • Creating a security view that hides PII and granting access only to that view:
CREATE VIEW vw_public_customers AS
SELECT id, username, email, created_at
FROM customers;
GRANT SELECT ON TABLE vw_public_customers TO webapp_role;
  • Column‑level privileges via roles (if your CockroachDB version supports it) – grant SELECT on specific columns to a limited role:
GRANT SELECT (id, username, email) ON TABLE customers TO readonly_role;

At the API layer, apply defensive serialization: after fetching rows, strip or mask any unexpected fields before sending the response. For instance, in Node.js:

const safeRows = rows.map(r => ({
  id: r.id,
  username: r.username,
  email: r.email
}));
res.json(safeRows);

Finally, enforce least‑privilege database users: the application should connect with a role that only has permission to the view or the column‑restricted table, preventing accidental over‑fetching even if the query string is altered.

Related CWEs: dataExposure

CWE IDNameSeverity
CWE-200Exposure of Sensitive Information HIGH
CWE-209Error Information Disclosure MEDIUM
CWE-213Exposure of Sensitive Information Due to Incompatible Policies HIGH
CWE-215Insertion of Sensitive Information Into Debugging Code MEDIUM
CWE-312Cleartext Storage of Sensitive Information HIGH
CWE-359Exposure of Private Personal Information (PII) HIGH
CWE-522Insufficiently Protected Credentials CRITICAL
CWE-532Insertion of Sensitive Information into Log File MEDIUM
CWE-538Insertion of Sensitive Information into Externally-Accessible File HIGH
CWE-540Inclusion of Sensitive Information in Source Code HIGH

Frequently Asked Questions

Does middleBrick need any credentials or agents to scan my CockroachDB API?
No. middleBrick is a zero‑setup, black‑box scanner. You only provide the public URL; it performs unauthenticated requests and analyzes the responses without requiring database credentials, agents, or network access to your CockroachDB cluster.
How can I verify that a view I created in CockroachDB is actually preventing PII exposure?
Run a simple query as the application’s database user: SELECT * FROM vw_public_customers;. If the result set omits columns such as ssn, phone_number, or email (depending on what you excluded), the view is working. You can also use middleBrick to scan the endpoint; the scanner will report no excessive data exposure if the view limits the returned fields.