HIGH excessive data exposurecockroachdb

Excessive Data Exposure in Cockroachdb

How Excessive Data Exposure Manifests in Cockroachdb

Excessive Data Exposure in Cockroachdb environments typically occurs when applications inadvertently return more data than necessary through API endpoints. This vulnerability is particularly prevalent in Cockroachdb due to its distributed architecture and flexible query capabilities.

One common manifestation involves SELECT * queries that return entire table rows when only specific columns are needed. For example, a user profile endpoint might execute:

SELECT * FROM users WHERE id = $1

This returns all columns including sensitive fields like password hashes, internal IDs, or audit timestamps that should never reach the client. Cockroachdb's distributed nature means these queries can span multiple nodes, potentially exposing data across the cluster.

Another Cockroachdb-specific pattern involves JSONB column expansion. When applications serialize entire JSONB objects without filtering:

SELECT id, profile FROM users WHERE id = $1

Developers often forget that JSONB columns can contain nested sensitive data like API keys, internal configuration, or PII that was stored for operational purposes but shouldn't be exposed.

Cockroachdb's INTERLEAVE functionality creates another exposure vector. When tables are interleaved for performance:

CREATE TABLE users (id UUID PRIMARY KEY, email STRING) INTERLEAVE IN PARENT accounts (id)

Joins across interleaved tables can inadvertently return parent table data alongside child table results, exposing account-level information when only user-level data was intended.

Time-travel queries (AS OF SYSTEM TIME) present unique risks in Cockroachdb. Developers might use these for debugging:

SELECT * FROM orders AS OF SYSTEM TIME '-30s' WHERE order_id = $1

Without realizing this exposes historical data that may include PII or sensitive business information from previous states.

Range queries on indexed columns can also leak excessive data. A seemingly innocuous:

SELECT * FROM products WHERE price BETWEEN 10 AND 20

might return internal cost data, supplier information, or margin calculations stored alongside product data.

Cockroachdb-Specific Detection

Detecting Excessive Data Exposure in Cockroachdb requires both static analysis of query patterns and runtime monitoring of data flows. middleBrick's API security scanner excels at identifying these issues through its black-box scanning approach.

middleBrick tests Cockroachdb endpoints by sending requests and analyzing responses for excessive data exposure. The scanner looks for:

  • Unexpected columns in JSON responses that match Cockroachdb's internal schemas
  • Timestamp fields with system-level precision (nanosecond timestamps common in Cockroachdb)
  • UUIDs in predictable patterns (Cockroachdb's distributed UUID generation)
  • Array fields containing more data than the API contract specifies

The scanner's 12 parallel security checks include Property Authorization testing specifically designed to catch when Cockroachdb queries return unauthorized data. For example, it might detect that a user profile endpoint returns:

{
  "id": "uuid-here",
  "email": "user@example.com",
  "password_hash": "$2b$12$abc...",
  "created_at": "2024-01-15 10:30:45.123456789Z",
  "updated_at": "2024-01-20 14:22:11.987654321Z",
  "internal_notes": "Sensitive internal data..."
}

middleBrick's LLM/AI Security checks are particularly relevant for Cockroachdb applications using AI features. The scanner tests for system prompt leakage that might contain database credentials or Cockroachdb-specific configuration data.

For OpenAPI spec analysis, middleBrick cross-references your API definitions with actual runtime responses. If your spec defines a minimal user object but the Cockroachdb query returns 20+ fields, the scanner flags this discrepancy.

Continuous monitoring in the Pro plan automatically rescans your Cockroachdb APIs on a schedule, alerting you when new excessive data exposure vulnerabilities appear due to schema changes or query modifications.

Cockroachdb-Specific Remediation

Remediating Excessive Data Exposure in Cockroachdb requires both query-level fixes and architectural changes. Here are Cockroachdb-specific remediation strategies:

1. Explicit Column Selection

-- Bad: SELECT *
-- Good: Explicit columns only
SELECT id, email, name, created_at FROM users WHERE id = $1

-- For JSONB columns, use explicit field selection
SELECT id, profile->'public_info' AS profile FROM users WHERE id = $1

2. Cockroachdb's Computed Columns for Data Masking

CREATE TABLE users (
    id UUID PRIMARY KEY,
    email STRING,
    sensitive_data STRING,
    -- Computed column that masks sensitive data
    public_profile AS (
        jsonb_build_object(
            'id', id,
            'email', email,
            'created_at', created_at
        )
    ) STORED
);

-- Query only the computed column
SELECT public_profile FROM users WHERE id = $1

3. Row-Level Security (RLS) with Cockroachdb

ALTER TABLE users 
  ENABLE ROW LEVEL SECURITY;

-- Policy to restrict data exposure
CREATE POLICY user_access ON users
    FOR SELECT 
    USING (id = crdb_internal.current_session_user());

-- Alternatively, use application-based filtering
CREATE POLICY app_filter ON users
    FOR SELECT 
    USING (email NOT LIKE '%internal%');

4. Cockroachdb's INTERLEAVE Data Isolation

-- Avoid exposing parent table data
CREATE TABLE user_profiles (
    user_id UUID PRIMARY KEY,
    email STRING,
    -- Only include necessary fields
    profile_data JSONB,
    -- No interleaving with accounts table
    CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id)
);

-- Use explicit joins instead of interleaving when parent data isn't needed
SELECT up.email, up.profile_data 
FROM user_profiles up 
WHERE up.user_id = $1;

5. Time-Travel Query Restrictions

-- Create a wrapper function to control AS OF SYSTEM TIME usage
CREATE OR REPLACE FUNCTION safe_select_user(user_id UUID)
RETURNS TABLE (
    id UUID,
    email STRING,
    name STRING
) AS $$
BEGIN
    -- Disallow historical queries in production
    IF current_setting('cluster.settings.time_travel_enabled', true) = 'true' THEN
        RAISE EXCEPTION 'Time-travel queries disabled in production';
    END IF;
    
    RETURN QUERY
    SELECT id, email, name 
    FROM users 
    WHERE id = user_id;
END;
$$ LANGUAGE plpgsql;

6. API Response Filtering

-- Use Cockroachdb's JSON functions to filter responses
CREATE OR REPLACE FUNCTION filter_user_response(user_id UUID)
RETURNS JSONB AS $$
DECLARE
    user_data JSONB;
BEGIN
    SELECT row_to_json(t) INTO user_data
    FROM (
        SELECT id, email, name, created_at
        FROM users 
        WHERE id = user_id
    ) t;
    
    -- Remove any fields that shouldn't be exposed
    RETURN user_data - 'password_hash' - 'internal_notes';
END;
$$ LANGUAGE plpgsql;

Related CWEs: propertyAuthorization

CWE IDNameSeverity
CWE-915Mass Assignment HIGH

Frequently Asked Questions

How does middleBrick detect Excessive Data Exposure in Cockroachdb APIs?
middleBrick scans your API endpoints without credentials and analyzes responses for data exposure. It looks for unexpected columns, sensitive fields like password hashes or internal IDs, and compares actual responses against your OpenAPI spec. The scanner identifies Cockroachdb-specific patterns like UUIDs, nanosecond timestamps, and JSONB column expansion that indicate excessive data exposure.
Can middleBrick scan Cockroachdb APIs in my CI/CD pipeline?
Yes, the middleBrick GitHub Action lets you add API security scans to your CI/CD pipeline. You can configure it to scan your Cockroachdb APIs during pull requests or before deployment, with options to fail builds if security scores drop below your threshold. The Pro plan includes continuous monitoring that automatically rescans your APIs on a schedule.