HIGH pii leakageazure

Pii Leakage on Azure

How Pii Leakage Manifests in Azure

Personally identifiable information (PII) can leave an Azure‑hosted API through several common misconfigurations and coding patterns. When developers rely on Azure’s platform services without applying the principle of least privilege, the resulting endpoints may return more data than intended, enabling attackers to harvest names, email addresses, phone numbers, social‑security numbers, or health‑related data.

  • Over‑permissive storage containers: An Azure Blob Storage account configured with public read access (or a SAS token granting sp=r on the container level) allows anyone who guesses the container URL to enumerate blobs. If the blobs contain JSON files, CSV exports, or diagnostic logs that include PII, the data is immediately exposed.
  • Entity‑returning Azure Functions or App Services: A function that reads an Azure Table Storage entity or a Cosmos DB document and returns the raw entity to the caller often includes every property, even those marked as internal. For example, a function that retrieves a user profile may inadvertently expose the PasswordHash or SSN field because the entity model is serialized directly.
  • Diagnostic logging and tracing: Azure App Service diagnostics, Application Insights, or Azure Monitor can capture request bodies, headers, or exception details. If the logging level is set to Verbose and the code logs the entire request object, PII submitted in a POST payload (e.g., a credit‑card number) becomes searchable in logs.
  • Improper use of Azure Key Vault references: When an ARM template or Bicep file references a Key Vault secret by using the reference function without masking, the secret value can appear in the deployment output or in the template’s debug log, leaking credentials that later grant access to data stores containing PII.
  • API Management payload pass‑through: If an API Management policy is configured to forward the backend response unchanged (<return-response> without a <set-body> filter), any over‑exposed backend data (including PII) is sent straight to the consumer.

These patterns map directly to OWASP API Security Top 10 2019 API3: Excessive Data Exposure. Real‑world incidents such as CVE‑2021‑26411 (a misconfigured Azure Storage account that allowed public read of container blobs containing user data) illustrate how a simple configuration drift can lead to large‑scale PII leakage.

Azure‑Specific Detection

Detecting PII leakage in Azure‑hosted APIs requires looking for both runtime responses and configuration drift. middleBrick performs unauthenticated black‑box scans that can surface these issues without needing credentials or agents.

What middleBrick looks for

  • Response bodies that match common PII patterns (email addresses, phone numbers, US‑style SSNs, credit‑card numbers via Luhn check, or GDPR‑style personal identifiers).
  • Headers that indicate overly permissive caching (Cache-Control: public) on endpoints that return PII.
  • Publicly accessible Azure Storage blob or container URLs discovered through URL guessing or via the x-ms-blob-type header in responses.
  • Error messages or stack traces that reveal internal identifiers (e.g., GUIDs that map to user IDs) when the API returns 500 errors.
  • Presence of diagnostic endpoints such as /diagnostics, /trace, or /logs that leak request payloads.

Scanning example with the middleBrick CLI

# Install the CLI (npm)
npm i -g middlebrick

# Scan an Azure Function endpoint
middlebrick scan https://myfuncapp.azurewebsites.net/api/GetUserProfile

# Output JSON for CI integration
middlebrick scan https://myapi.azurewebsites.net/items --format json > scan-result.json

The scan returns a risk score (A–F) and a per‑category breakdown. If the Data Exposure category shows findings such as "PII detected in response body" or "Public storage container detected", the report includes the exact URL, the matched pattern, and a severity rating (usually high).

Because middleBrick does not require credentials, it can also be run against staging or production URLs in a GitHub Action:

name: API Security Scan
on: [push]
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: npm i -g middlebrick
      - run: middlebrick scan https://staging-myapp.azurewebsites.net --fail-below B

The action will fail the build if the score drops below the threshold you set (e.g., B), giving you an early warning before code reaches production.

Azure‑Specific Remediation

Fixing PII leakage in Azure relies on applying the platform’s native security controls and adjusting application code to return only the data that is strictly necessary.

Storage hardening

  • Set the Blob Storage account’s publicNetworkAccess to Disabled or ensure containers are Private. Use Azure Policy storageAccountsAllowBlobPublicAccess to enforce this.
  • When a SAS token is required, generate it with the most limited permissions (sp=r on a specific blob, not the container) and a short expiry (se parameter).
  • Enable Azure Defender for Storage to receive alerts when a container becomes publicly accessible.

Function/App Service data filtering

  • Never return raw entity models. Instead, map to a DTO that excludes sensitive fields.
  • In .NET 6 Azure Functions:
using System.Collections.Generic;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;

public static class GetUserProfile
{
    public record UserDto(string Id, string Name, string Email); // SSN omitted

    [FunctionName("GetUserProfile")]
    public static async Task Run(
        [HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "profile/{id}")] HttpRequest req,
        string id,
        ILogger log)
    {
        // Assume we fetch an entity from Table Storage
        var entity = await TableService.GetEntityAsync<UserEntity>("Users", id);
        if (entity == null) return new NotFoundResult();

        var dto = new UserDto(entity.PartitionKey, entity.Name, entity.Email);
        return new OkObjectResult(dto);
    }
}

// Internal entity – never exposed
public class UserEntity
{
    public string PartitionKey { get; set; }
    public string RowKey { get; set; }
    public string Name { get; set; }
    public string Email { get; set; }
    public string SSN { get; set; } // sensitive, not in DTO
}

The function now returns only Id, Name, and Email.

  • In Node.js with the Azure SDK for Cosmos DB, request only the needed fields via a SELECT projection:
const { CosmosClient } = require("@azure/cosmos");
const client = new CosmosClient(process.env.COSMOS_CONNECTION_STRING);
const container = client.database("UsersDb").container("Profiles");

async function getUser(id) {
    const query = "SELECT c.id, c.name, c.email FROM c WHERE c.id = @id";
    const { resources } = await container.items
        .query(query, [{ name: "@id", value: id }])
        .fetchAll();
    return resources[0]; // contains only id, name, email
}

Logging and diagnostics

  • Configure Azure App Service diagnostic settings to send logs to a Log Analytics workspace with RetentionInDays set and Categories limited to AppServiceConsoleLogs and AppServiceHTTPLogs. Disable FailedRequestsTracing if it captures request bodies.
  • In Application Insights, set SamplingPercentage to a low value and use TelemetryProcessor to drop telemetry that contains patterns matching PII (e.g., regex for SSN).
  • Review ARM/Bicep templates: replace any reference(resourceId('Microsoft.KeyVault/vaults/secrets/', vaultName, secretName), '2019-09-01').value with a secure parameter that is marked secureString and never output in deployment logs.

API Management policies

  • Use a <set-body> policy to project a safe response:
<outbound>
    <base />
    <set-body>{
        "id": $(context.Response.Body.As<JObject>()["id"]),
        "name": $(context.Response.Body.As<JObject>()["name"]),
        "email": $(context.Response.Body.As<JObject>()["email"])
    }</set-body>
</outbound>

By combining these controls—private storage, data‑transfer‑object patterns, restrained logging, and API Management filtering—you eliminate the most common pathways through which PII leaks from Azure‑hosted APIs.

Related CWEs: dataExposure

CWE IDNameSeverity
CWE-200Exposure of Sensitive Information HIGH
CWE-209Error Information Disclosure MEDIUM
CWE-213Exposure of Sensitive Information Due to Incompatible Policies HIGH
CWE-215Insertion of Sensitive Information Into Debugging Code MEDIUM
CWE-312Cleartext Storage of Sensitive Information HIGH
CWE-359Exposure of Private Personal Information (PII) HIGH
CWE-522Insufficiently Protected Credentials CRITICAL
CWE-532Insertion of Sensitive Information into Log File MEDIUM
CWE-538Insertion of Sensitive Information into Externally-Accessible File HIGH
CWE-540Inclusion of Sensitive Information in Source Code HIGH

Frequently Asked Questions

Can middleBrick detect PII that is only exposed through Azure Storage blob URLs?
Yes. middleBrick’s unauthenticated scan includes passive discovery of publicly accessible blob containers and checks any returned content (JSON, CSV, logs) for PII patterns such as email addresses, SSNs, or credit‑card numbers. If a blob is open, the finding is reported under the Data Exposure category with a severity rating of high.
What is the quickest way to verify that an Azure Function is not returning excess fields after I apply a DTO fix?
Run a middleBrick scan against the function’s public URL. The scan will examine the JSON response; if no PII patterns are found and the Data Exposure category shows a clean result, you can be confident the function is no longer leaking unnecessary data. You can also inspect the response manually with curl or Postman to confirm the shape matches your DTO.