HIGH llm data leakagedjangohmac signatures

Llm Data Leakage in Django with Hmac Signatures

Llm Data Leakage in Django with Hmac Signatures — how this specific combination creates or exposes the vulnerability

When Django applications use HMAC signatures to authenticate or authorize requests, they typically compute a signature on a canonical representation of the request data (e.g., selected headers, method, path, and body). If the application inadvertently includes sensitive data—such as authentication tokens, PII, or internal identifiers—in the data that is signed or in the resulting signature, and then exposes that information to an LLM endpoint, a data leakage can occur.

LLM data leakage in this context refers to any pathway by which sensitive information is revealed through the behavior or outputs of an LLM integration. With HMAC-based workflows, a common pattern is to generate a signature on the client, send it to a Django backend, and then forward the request together with the signature to an external LLM service. If the logic that builds the HMAC input includes secrets or sensitive payloads, or if the application logs or echoes the signature and request details to LLM prompts or tool calls, the LLM may be able to infer or reproduce that data through its outputs, tool usage patterns, or via side channels such as error messages.

Consider a Django service that signs requests to an LLM endpoint using an HMAC derived from a JSON body and a timestamp. If the JSON body contains a user’s email or a token, and the computed signature is transmitted in headers that are logged or passed into prompt templates, the signature and the underlying data may be exposed. The LLM might infer relationships between repeated signatures and inputs, or an attacker could induce leakage by prompting the model to reveal details about the signing process, especially if the application uses verbose logging or exposes intermediate values in tool calls.

Excessive agency patterns compound the risk. If the Django service uses an LLM agent that can invoke functions or tools, and the HMAC or related metadata is surfaced as tool parameters or context, the model may be able to leverage those capabilities to extract or propagate sensitive information. For example, an LLM tool call that includes a signature and a timestamp could allow the model to experiment with different inputs to learn about the signing logic or to reproduce sensitive content in its responses.

Unauthenticated LLM endpoints further increase exposure. If a Django service forwards HMAC-signed requests to an LLM that does not require authentication, there is no guarantee that the responses are trustworthy, and an attacker might probe the endpoint to infer how signatures are constructed or to elicit sensitive information through crafted inputs. Output scanning becomes critical here: responses from the LLM must be inspected for API keys, PII, or executable code, and any leakage of HMAC-related values in LLM outputs must be treated as a security incident.

Hmac Signatures-Specific Remediation in Django — concrete code fixes

Remediation focuses on ensuring that HMAC inputs never include sensitive data, that signatures are handled as opaque values, and that logging and LLM interactions do not expose signing material. Below is a concrete, secure pattern for generating and verifying HMAC signatures in Django without leaking sensitive information to LLMs or other external systems.

import json
import hmac
import hashlib
import time
from django.conf import settings
from django.http import JsonResponse
from django.views import View

# A minimal, safe HMAC signing utility for Django.
# It signs only non-sensitive metadata and a fixed set of canonical headers/paths.
def build_hmac_signature(secret: bytes, method: str, path: str, timestamp: str, safe_headers: dict) -> str:
    # Canonicalize the data that is necessary for routing or replay protection.
    # Exclude request body, user identifiers, tokens, PII, and any LLM-relevant content.
    payload = json.dumps({
        "method": method,
        "path": path,
        "timestamp": timestamp,
        "safe_headers": safe_headers,
    }, separators=(",", ":"), sort_keys=True)
    return hmac.new(secret, payload.encode("utf-8"), hashlib.sha256).hexdigest()

class SecureApiView(View):
    # Example view that signs only safe metadata and forwards a sanitized request to an LLM.
    def post(self, request):
        timestamp = str(int(time.time()))
        # Only include headers that are safe to expose and necessary for the workflow.
        safe_headers = {
            "content-type": request.headers.get("Content-Type", ""),
            "x-request-id": request.headers.get("X-Request-ID", ""),
        }
        signature = build_hmac_signature(
            secret=settings.HMAC_SECRET_KEY.encode("utf-8"),
            method=request.method,
            path=request.path,
            timestamp=timestamp,
            safe_headers=safe_headers,
        )

        # Build a sanitized payload that excludes sensitive data before LLM interaction.
        safe_body = {
            "action": request.POST.get("action"),
            "resource_id": request.POST.get("resource_id"),
            # Never forward user email, token, session data, or raw body to LLM.
        }

        # Forward only safe data and the opaque signature to the LLM endpoint.
        # The signature is treated as an opaque verifier, not as part of the prompt.
        llm_payload = {
            "input": safe_body,
            "timestamp": timestamp,
            "sig": signature,
        }
        # send_to_llm(llm_payload) would be implemented separately, ensuring no sensitive data is included.

        return JsonResponse({"status": "ok", "timestamp": timestamp, "sig": signature})

# Verification on the consumer or server side: recompute and compare in constant time.
def verify_hmac_signature(secret: bytes, method: str, path: str, timestamp: str,
                         safe_headers: dict, received_signature: str) -> bool:
    expected = build_hmac_signature(secret, method, path, timestamp, safe_headers)
    return hmac.compare_digest(expected, received_signature)

Key practices to prevent LLM data leakage with HMAC workflows:

Scope the HMAC to non-sensitive metadata only (method, path, timestamp, safe headers). Never include request bodies, tokens, PII, or internal IDs.
Treat the signature as an opaque verifier. Do not embed it in prompts, tool parameters, or logs that are visible to the LLM.
Sanitize any data forwarded to LLMs. Strip authentication tokens, emails, and other sensitive fields before constructing the LLM payload.
Avoid logging HMAC inputs or signatures. If logging is required for observability, ensure logs are protected and do not reach LLM-integration code.
Implement output scanning for LLM responses to detect accidental exposure of API keys, PII, or code, and reject or quarantine problematic outputs.
Prefer authenticated LLM endpoints and enforce strict input validation on any data used in signature construction.

Related CWEs: llmSecurity

CWE ID	Name	Severity
CWE-754	Improper Check for Unusual or Exceptional Conditions	MEDIUM

Frequently Asked Questions

Can HMAC signatures themselves be safely exposed to an LLM?

Generally, HMAC signatures should be treated as opaque values. Avoid exposing raw signatures or using them as LLM input that could allow the model to infer signing logic or sensitive data.

How can I ensure my Django HMAC implementation does not leak data to LLMs?

Sign only non-sensitive metadata, sanitize all data sent to LLMs, avoid logging signatures, and use output scanning to detect any accidental leakage from LLM responses.

Llm Data Leakage in Django with Hmac Signatures