HIGH pii leakageaws

Pii Leakage on Aws

How PII Leakage Manifests in AWS

PII leakage in AWS environments typically occurs through misconfigured API responses, logging mechanisms, and data processing workflows. In serverless architectures using AWS Lambda, developers often inadvertently expose sensitive data through response objects that include complete database records without proper filtering.

// Vulnerable Lambda function exposing full user records
const AWS = require('aws-sdk');
const ddb = new AWS.DynamoDB.DocumentClient();

exports.handler = async (event) => {
  const userId = event.pathParameters.userId;
  
  const params = {
    TableName: 'Users',
    Key: { id: userId }
  };
  
  const result = await ddb.get(params).promise();
  
  // Returns entire DynamoDB item including SSN, credit card, etc.
  return {
    statusCode: 200,
    body: JSON.stringify(result.Item)
  };
};

This pattern is particularly dangerous in AWS API Gateway integrations where the entire Lambda response becomes the HTTP response body. Attackers can exploit this through enumeration attacks, iterating through user IDs to harvest PII at scale.

Another common AWS-specific manifestation occurs in S3 bucket configurations. Developers frequently store PII in S3 buckets with overly permissive ACLs or bucket policies, then expose those objects through presigned URLs with excessively long expiration times.

// Insecure presigned URL generation
const AWS = require('aws-sdk');
const s3 = new AWS.S3();

exports.handler = async (event) => {
  const params = {
    Bucket: 'my-pii-bucket',
    Key: 'user-documents/' + event.userId + '.pdf',
    Expires: 604800 // 7 days - too long for PII
  };
  
  const url = s3.getSignedUrl('getObject', params);
  return { url };
};

CloudWatch Logs present another AWS-specific PII leakage vector. Developers often log complete request objects or database results without sanitization, creating persistent records of sensitive data that remain accessible through AWS IAM permissions.

AWS Step Functions workflows can also leak PII when passing state data between Lambda functions. If a state contains sensitive information and is logged or stored in the execution history, it becomes accessible to anyone with Step Functions permissions.

AWS-Specific Detection

Detecting PII leakage in AWS requires both automated scanning and manual inspection of configurations. The middleBrick CLI provides AWS-specific detection capabilities that scan API endpoints for exposed PII patterns without requiring credentials or code access.

# Scan an AWS API Gateway endpoint
middlebrick scan https://api.example.com/user/12345

# Scan with AWS-specific checks enabled
middlebrick scan --checks=all --output=json https://aws-api.example.com/profile

The scanner identifies AWS-specific patterns including:

  • Exposed DynamoDB table names in error messages
  • S3 bucket references in responses
  • AWS service ARNs in API responses
  • CloudFormation stack identifiers
  • AWS region information leakage
  • EC2 instance metadata exposure attempts

For deeper AWS security analysis, middleBrick's OpenAPI spec analysis can detect AWS-specific vulnerabilities by cross-referencing your API specification with runtime findings.

# Example OpenAPI spec with AWS-specific annotations
openapi: 3.0.0
info:
  title: AWS User Service
  version: 1.0.0
paths:
  /users/{userId}:
    get:
      x-aws-integration:
        type: lambda
        lambdaFunction: getUserDetails
      responses:
        '200':
          description: User profile
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/User'
components:
  schemas:
    User:
      type: object
      properties:
        id:
          type: string
        name:
          type: string
        ssn:
          type: string
          x-pii: true

CloudTrail logging provides another detection mechanism. Enable detailed logging and monitor for API calls that return sensitive data without proper authorization checks.

AWS-Specific Remediation

Remediating PII leakage in AWS environments requires a defense-in-depth approach using native AWS services and security best practices. Start with data minimization in your Lambda functions and API Gateway responses.

// Secure Lambda function with PII filtering
const AWS = require('aws-sdk');
const ddb = new AWS.DynamoDB.DocumentClient();

const PII_FIELDS = ['ssn', 'creditCardNumber', 'bankAccount', 'driversLicense'];

exports.handler = async (event) => {
  const userId = event.pathParameters.userId;
  
  const params = {
    TableName: 'Users',
    Key: { id: userId }
  };
  
  const result = await ddb.get(params).promise();
  
  // Filter out PII fields before returning
  const safeResponse = Object.keys(result.Item)
    .filter(key => !PII_FIELDS.includes(key))
    .reduce((obj, key) => {
      obj[key] = result.Item[key];
      return obj;
    }, {});
  
  return {
    statusCode: 200,
    body: JSON.stringify(safeResponse)
  };
};

Implement AWS WAF rules to detect and block requests attempting to enumerate user IDs or access unauthorized resources.

{
  "Type": "AWS::WAFv2::WebACL",
  "Properties": {
    "Name": "PIIProtectionACL",
    "Scope": "REGIONAL",
    "DefaultAction": { "Allow": {} },
    "Rules": [
      {
        "Name": "UserIDEnumeration",
        "Priority": 1,
        "Action": { "Block": {} },
        "Statement": {
          "RateBasedStatement": {
            "AggregateKeyType": "IP",
            "Limit": 100,
            "ScopeDownStatement": {
              "StringMatchStatement": {
                "FieldToMatch": { "UriPath": {} },
                "TextualAlgorithm": "SIMPLE_MAXIMUM",
                "SearchString": "/user/"
              }
            }
          }
        }
      }
    ]
  }
}

Use AWS Secrets Manager for storing and retrieving sensitive configuration data rather than hardcoding or exposing it in responses.

// Securely retrieve configuration from Secrets Manager
const AWS = require('aws-sdk');
const secretsManager = new AWS.SecretsManager();

exports.handler = async (event) => {
  const secret = await secretsManager.getSecretValue({
    SecretId: 'my-app-config'
  }).promise();
  
  const config = JSON.parse(secret.SecretString);
  // Use config data without exposing secrets
};

Enable AWS Config rules to continuously monitor for PII exposure risks across your infrastructure.

{
  "Description": "Check for S3 buckets with public access",
  "Scope": {
    "ComplianceResourceTypes": ["AWS::S3::Bucket"]
  },
  "Source": {
    "Owner": "AWS",
    "SourceIdentifier": "S3_BUCKET_PUBLIC_READ_PROHIBITED"
  }
}

Implement VPC endpoints for S3 to prevent data exfiltration through internet gateways and enable VPC Flow Logs for monitoring suspicious data access patterns.

Related CWEs: dataExposure

CWE IDNameSeverity
CWE-200Exposure of Sensitive Information HIGH
CWE-209Error Information Disclosure MEDIUM
CWE-213Exposure of Sensitive Information Due to Incompatible Policies HIGH
CWE-215Insertion of Sensitive Information Into Debugging Code MEDIUM
CWE-312Cleartext Storage of Sensitive Information HIGH
CWE-359Exposure of Private Personal Information (PII) HIGH
CWE-522Insufficiently Protected Credentials CRITICAL
CWE-532Insertion of Sensitive Information into Log File MEDIUM
CWE-538Insertion of Sensitive Information into Externally-Accessible File HIGH
CWE-540Inclusion of Sensitive Information in Source Code HIGH

Frequently Asked Questions

How does middleBrick detect PII leakage in AWS Lambda functions?
middleBrick scans the API endpoint exposed by your Lambda function through API Gateway, testing for common PII exposure patterns like complete database record returns, error messages containing sensitive data, and improper data filtering. It doesn't require Lambda code access or AWS credentials—it tests the actual HTTP endpoint as an external attacker would.
Can middleBrick scan AWS Step Functions workflows for PII leakage?
Yes, middleBrick can scan Step Functions state machines by testing the API endpoints they expose. It identifies PII leakage through state data exposure, execution history access, and improper data handling between workflow steps. The scanner checks for excessive data exposure in workflow responses and logs.