HIGH llm data leakagechidynamodb

Llm Data Leakage in Chi with Dynamodb

Llm Data Leakage in Chi with Dynamodb — how this specific combination creates or exposes the vulnerability

LLM data leakage in the context of a Chi web application using Amazon DynamoDB occurs when sensitive information stored in DynamoDB is unintentionally exposed through LLM-related endpoints or workflows. Chi is a functional, composable web framework for Clojure, and when it integrates with DynamoDB via the AWS SDK, developers must ensure that data access patterns and LLM integrations do not leak records or metadata.

The risk arises when LLM endpoints (such as chat completions or embedding generation) interact with DynamoDB-backed data stores without proper authorization checks. For example, an endpoint that retrieves a DynamoDB item by ID and passes the item content to an LLM for summarization or analysis may expose PII, API keys, or other sensitive content in LLM responses. middleBrick’s LLM/AI Security checks specifically detect this scenario by scanning for unauthenticated LLM endpoints and analyzing outputs for PII, API keys, and executable code.

In a Chi service, a typical integration might query DynamoDB and forward the result to an LLM client. If the route is publicly accessible or lacks proper authentication and input validation, an attacker could manipulate parameters to access other users’ DynamoDB records and trigger data leakage through the LLM. This can happen via IDOR (Insecure Direct Object Reference) or BOLA (Broken Object Level Authorization), where object identifiers are guessed or iterated to retrieve unauthorized data.

DynamoDB’s flexible schema and index patterns can inadvertently expose additional leakage vectors. For instance, a Global Secondary Index (GSI) that includes sensitive attributes might be queried by an LLM integration that does not restrict the returned fields. If the LLM response includes those fields, sensitive data could be persisted or exposed in logs or model outputs. middleBrick’s Data Exposure and Property Authorization checks help identify such risky configurations by cross-referencing OpenAPI specs with runtime behavior.

Another scenario involves prompt injection and jailbreak testing against LLM endpoints that use DynamoDB as a context store. An attacker might craft inputs designed to coerce the LLM into revealing stored data, such as injecting system prompts or attempting data exfiltration. middleBrick’s active prompt injection probes include system prompt extraction and data exfiltration tests that specifically target such integrations to ensure DynamoDB-backed context does not facilitate leakage through the LLM layer.

Encryption and inventory management checks are also relevant. If DynamoDB tables contain sensitive data, middleBrick verifies that encryption at rest is enabled and that inventory management practices prevent unintended data exposure through LLM tooling. These checks align with frameworks such as OWASP API Top 10 and SOC2, emphasizing the need to secure data flows between Chi routes, DynamoDB, and LLM components.

Dynamodb-Specific Remediation in Chi — concrete code fixes

To prevent LLM data leakage in Chi when using DynamoDB, implement strict authorization, input validation, and output handling. Below are concrete code examples using the AWS SDK for DynamoDB in a Chi application.

1. Enforce authorization before fetching DynamoDB items

Ensure that every DynamoDB request is scoped to the authenticated user. Use Chi’s middleware to validate identity and apply row-level security by filtering queries with the user’s ID.

(ns myapp.handlers
  (:require [cheshire.core :as json]
            [aws.sdk.dynamodb :as ddb]
            [myapp.middleware :refer [get-current-user]]))

(defn get-user-data [req]
  (let [user (get-current-user req)
        user-id (:user-id user)
        client (:dynamodb-client req)
        response (ddb/get-item client
                               {:table-name "UserData"
                                :key {"user_id" {:s user-id}}})]
    {:status 200
     :body (:item response)}))

2. Validate and sanitize inputs to prevent IDOR

Always validate and sanitize path or query parameters before using them as DynamoDB keys. Avoid directly using user-supplied IDs without mapping them to authorized resources.

(defn safe-item-handler [req]
  (let [user-id (some-> req :session :user :id)
        item-id (get-in req [:params :item-id])
        ;; Ensure item-id belongs to the user, e.g., via a lookup or namespaced key
        qualified-key (str user-id "#" item-id)]
    (if (valid-item-access? user-id item-id)
      (let [client (:dynamodb-client req)
            resp (ddb/get-item client {:table-name "Items"
                                       :key {"pk" {:s qualified-key}}})]
        {:status 200, :body (:item resp)})
      {:status 403, :body {:error "Forbidden"}})))

3. Limit DynamoDB responses before LLM consumption

When passing DynamoDB data to an LLM, explicitly select only required fields and remove sensitive attributes to prevent leakage in LLM outputs or logs.

(defn prepare-context-for-llm [item]
  ;; Select only non-sensitive fields
  {:question (:question item)
   :context (:public-context item)})

(defn query-and-summarize [req]
  (let [client (:dynamodb-client req)
        item (ddb/get-item client {:table-name "Content"
                                   :key {"id" {:s (get-in req [:params :id])}}})
        context (prepare-context-for-llm (:item item))
        summary (call-llm-summary context)] ;; hypothetical LLM call
    {:status 200, :body {:summary summary}}))

4. Secure LLM endpoints with authentication and input validation

Protect LLM routes in Chi using middleware that requires authentication and validates inputs to mitigate prompt injection and unauthorized data access.

(def app
  (-> (routes)
      (wrap-authentication authenticate-user)
      (wrap-params)
      (route/GET "/summarize/:id"
        [req] (handle-summarize req))))

5. Enable encryption and monitor access patterns

Ensure DynamoDB tables have encryption at rest enabled and use middleware to audit access patterns, reducing the risk of unnoticed leakage.

(;; Encryption is configured at the AWS account or table level, not in application code)
;; Example table definition with encryption
(aws/dynamodb-create-table
 {:table-name "UserData"
  :key-schema [{:attribute-name "user_id" :key-type "HASH"}]
  :attribute-definitions [{:attribute-name "user_id" :attribute-type "S"}]
  :billing-mode :pay-per-request
  :sse-specification {:enabled true :kms-key-id "alias/aws/dynamodb"}})

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How does middleBrick detect LLM data leakage involving DynamoDB?
middleBrick scans unauthenticated endpoints and analyzes LLM outputs for PII, API keys, and executable code. It cross-references DynamoDB access patterns with LLM integration points to identify data exposure risks.
Can middleBrick prevent LLM data leakage in Chi applications?
middleBrick detects and reports findings with remediation guidance. It does not prevent or fix issues. Developers must implement authorization, input validation, and output sanitization in Chi routes and DynamoDB interactions.