Excessive Data Exposure with Bearer Tokens
How Excessive Data Exposure Manifests in Bearer Tokens
Excessive Data Exposure in Bearer Token implementations occurs when APIs return more data than necessary, often including sensitive information that should be hidden from the client. This vulnerability is particularly problematic in Bearer Token systems because the token itself often grants broad access, making over-exposed data more dangerous.
A common manifestation appears in API endpoints that return entire user objects when only basic profile information is needed. Consider a user profile endpoint that returns:
{
"id": "12345",
"email": "user@example.com",
"password_hash": "$2b$12$...",
"ssn": "123-45-6789",
"credit_card": {
"number": "4111111111111111",
"cvv": "123",
"expiry": "12/23"
},
"internal_notes": "VIP customer, flagged for fraud review"
}
This response exposes password hashes, social security numbers, complete credit card details, and internal notes that should never reach the client. The Bearer Token grants access to this user's data, but the API implementation fails to filter sensitive fields.
Another pattern involves nested relationships where APIs eagerly load related data. A typical example is a blog post API that returns:
{
"id": 1,
"title": "API Security Best Practices",
"content": "...",
"author": {
"id": 42,
"name": "John Doe",
"email": "john@example.com",
"phone": "+1-555-1234",
"billing_address": "...",
"internal_id": "A1B2C3"
},
"comments": [
{
"id": 101,
"content": "Great post!",
"author": {
"id": 99,
"name": "Jane Smith",
"email": "jane@example.com",
"ssn_last4": "6789"
}
}
]
}
Here, the API returns author emails, phone numbers, billing addresses, and even comment authors' partial SSNs. The Bearer Token allows access to this content, but the API design violates data minimization principles.
Database-level issues also contribute to excessive exposure. Using SELECT * queries without proper filtering is a common anti-pattern:
-- Vulnerable: returns all columns
SELECT * FROM users WHERE id = ?;
-- Secure: explicitly select only needed columns
SELECT id, name, email, created_at FROM users WHERE id = ?;
Object-relational mapping (ORM) libraries can exacerbate this when developers use .all() or .select_related() without considering the returned fields. For example:
# Vulnerable: returns entire User object
user = User.objects.get(id=user_id)
return JsonResponse(user.__dict__)
# Secure: serialize only necessary fields
user = User.objects.values('id', 'name', 'email').get(id=user_id)
return JsonResponse(list(user))
API versioning can also introduce excessive exposure when newer versions add fields but older clients don't need them. A v2 endpoint might include analytics data, marketing preferences, or other metadata that v1 clients never requested.
Bearer Tokens-Specific Detection
Detecting excessive data exposure in Bearer Token APIs requires systematic analysis of API responses against expected data contracts. The first step is understanding what data each endpoint should return based on its purpose and the client's needs.
Manual testing involves examining API responses and asking critical questions: Does this endpoint need to return password hashes? Should it include internal identifiers? Are there nested objects containing sensitive data? Using tools like curl or Postman with valid Bearer Tokens, you can capture responses and analyze their structure.
# Test endpoint and save response
curl -H "Authorization: Bearer $TOKEN" \
https://api.example.com/users/123 \
-o response.json
# Analyze JSON structure
jq '. | keys' response.json
Automated scanning tools like middleBrick can systematically detect excessive data exposure by comparing actual API responses against security policies and data classification rules. middleBrick's black-box scanning approach tests the unauthenticated attack surface, but when Bearer Tokens are involved, it can:
- Identify fields containing PII, financial data, or credentials
- Detect overly broad responses that include nested sensitive data
- Flag endpoints returning internal system information
- Compare response schemas against expected contracts
- Identify data exposure across different user roles and permissions
middleBrick's LLM/AI security capabilities add another dimension by detecting if AI model responses contain excessive data exposure, particularly in LLM endpoints that might inadvertently reveal training data or system prompts.
Static analysis tools can also help detect potential data exposure in code before deployment. Tools like SonarQube, ESLint plugins, or custom scripts can flag dangerous patterns:
# Example static analysis rule
import ast
import re
class DataExposureChecker(ast.NodeVisitor):
def visit_Call(self, node):
if isinstance(node.func, ast.Attribute):
if node.func.attr == 'values' or node.func.attr == 'only':
# Check if sensitive fields are being selected
for keyword in node.keywords:
if keyword.arg == 'fields':
fields = keyword.value.elts
sensitive_fields = {'password', 'ssn', 'credit_card'}
if sensitive_fields.intersection(fields):
print(f"Warning: sensitive fields in {node.func.attr} call")
self.generic_visit(node)
API specification analysis using OpenAPI/Swagger specs can also reveal potential data exposure. By examining the schema definitions and response models, you can identify fields that should be marked as sensitive or excluded from certain responses.
Bearer Tokens-Specific Remediation
Remediating excessive data exposure in Bearer Token APIs requires a multi-layered approach focusing on data minimization, proper serialization, and secure data handling practices.
The most effective remediation is implementing explicit field selection at the API layer. Instead of returning entire objects, define response schemas that include only necessary fields:
# Django REST Framework example
from rest_framework import serializers
class UserSerializer(serializers.ModelSerializer):
class Meta:
model = User
# Only include safe, necessary fields
fields = ['id', 'username', 'email', 'created_at']
# Explicitly exclude sensitive fields
extra_kwargs = {
'password': {'write_only': True},
'ssn': {'read_only': False, 'required': False},
}
class UserProfileView(APIView):
def get(self, request, user_id):
user = get_object_or_404(User, id=user_id)
# Check if user has permission to view this profile
if not request.user.has_permission('view_profile', user):
return Response({'detail': 'Forbidden'}, status=403)
serializer = UserSerializer(user)
return Response(serializer.data)
For Node.js/Express applications using JWT Bearer Tokens, implement similar field filtering:
const express = require('express');
const jwt = require('jsonwebtoken');
const User = require('../models/User');
const router = express.Router();
// Define safe fields for different user roles
const SAFE_FIELDS = {
self: ['id', 'username', 'email', 'created_at'],
admin: ['id', 'username', 'email', 'created_at', 'role', 'last_login'],
public: ['id', 'username']
};
router.get('/users/:id', authenticateToken, async (req, res) => {
try {
const user = await User.findById(req.params.id).select('-password -ssn -credit_card');
if (!user) {
return res.status(404).json({ error: 'User not found' });
}
// Determine what fields to return based on requester's role
let fields = SAFE_FIELDS.self;
if (req.user.role === 'admin') {
fields = SAFE_FIELDS.admin;
} else if (req.params.id !== req.user.id) {
// User requesting another user's profile
fields = SAFE_FIELDS.public;
}
const response = {};
fields.forEach(field => {
if (user[field] !== undefined) {
response[field] = user[field];
}
});
res.json(response);
} catch (error) {
res.status(500).json({ error: 'Internal server error' });
}
});
function authenticateToken(req, res, next) {
const authHeader = req.headers['authorization'];
const token = authHeader && authHeader.split(' ')[1];
if (!token) return res.sendStatus(401);
jwt.verify(token, process.env.ACCESS_TOKEN_SECRET, (err, user) => {
if (err) return res.sendStatus(403);
req.user = user;
next();
});
}
Database-level remediation involves using projection queries to select only necessary columns:
-- Instead of SELECT *
SELECT id, username, email, created_at FROM users WHERE id = ?;
-- For related data, use explicit joins
SELECT
u.id, u.username, u.email,
p.title, p.content,
c.content as comment_content
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
LEFT JOIN comments c ON p.id = c.post_id
WHERE u.id = ?;
API response filtering middleware can provide a centralized approach to data exposure prevention:
// Express middleware for response filtering
function filterResponse(fieldsWhitelist) {
return (req, res, next) => {
const originalSend = res.send;
res.send = function(data) {
if (typeof data === 'object') {
const filtered = filterObject(data, fieldsWhitelist);
originalSend.call(this, filtered);
} else {
originalSend.call(this, data);
}
};
next();
};
}
function filterObject(obj, whitelist) {
if (Array.isArray(obj)) {
return obj.map(item => filterObject(item, whitelist));
}
if (typeof obj === 'object' && obj !== null) {
const filtered = {};
Object.keys(obj).forEach(key => {
if (whitelist.includes(key) || isSafeField(key)) {
filtered[key] = obj[key];
}
});
return filtered;
}
return obj;
}
function isSafeField(fieldName) {
const safeFields = ['id', 'created_at', 'updated_at'];
return safeFields.includes(fieldName);
}
Finally, implement comprehensive logging and monitoring to detect when excessive data exposure occurs in production. Log requests that return large amounts of data or contain sensitive fields, and set up alerts for unusual data access patterns.
Related CWEs: propertyAuthorization
| CWE ID | Name | Severity |
|---|---|---|
| CWE-915 | Mass Assignment | HIGH |