Hallucination Attacks in Express with Firestore
Hallucination Attacks in Express with Firestore — how this specific combination creates or exposes the vulnerability
Hallucination attacks in an Express service that uses Firestore occur when an application returns fabricated or misleading data while presenting it as authoritative. This typically arises when the application layer synthesizes responses from Firestore records, such as merging partial data, injecting inferred fields, or combining results from multiple queries into a single coherent but incorrect output. Firestore documents often contain nested maps and arrays; if the Express layer reshapes these structures without strict validation, it can introduce inconsistencies that appear as hallucinations to downstream clients.
In Express, routes that query Firestore may perform additional processing—adding computed fields, aggregating counts from multiple documents, or enriching responses with cached data. When this processing lacks strict source attribution, the response may contain information that never existed in Firestore. For example, an endpoint might merge a user profile document with a separate activity log to produce a summary, inadvertently inserting plausible but incorrect activity entries. Because Firestore does not enforce schema consistency across collections, the Express layer must explicitly validate that each returned field has a verifiable source in the queried documents.
The risk is compounded when Firestore security rules are permissive for read operations and the Express layer assumes stronger guarantees than Firestore provides. An attacker may exploit overly broad rules to request data they cannot normally access, then observe how the Express route combines and presents this data. If the route fills missing fields with default or inferred values, the response may reveal internal assumptions or relationships, effectively leaking logic through fabricated content. This can expose business logic, such as how promotions are calculated or how permissions are derived, without directly reading restricted documents.
Another vector involves pagination and cursor-based navigation. Firestore queries return limited result sets; an Express route might implement client-side cursors by storing document snapshots in session or signed tokens. If the route later reconstructs a full list by hallucinating intermediate items to maintain a consistent page size, an attacker can probe boundaries and infer data presence by observing which hallucinated entries are never validated against Firestore. This can lead to indirect data exfiltration, where the presence or absence of specific records is inferred from inconsistencies in hallucinated content.
LLM-focused hallucination patterns are also relevant when Express services integrate language models to generate summaries or suggestions based on Firestore data. If prompts include raw Firestore documents without redacting sensitive placeholders or without clear source attribution, the model may confidently invent values that appear consistent with the supplied context. For instance, an endpoint that sends user activity arrays to an LLM to produce a natural language summary might receive back fabricated entries that sound authoritative. Because Firestore does not provide provenance tracking for derived text, the Express layer must separately ensure that any LLM output referencing specific documents is verifiable against the original query results.
Firestore-Specific Remediation in Express — concrete code fixes
Remediation focuses on ensuring every field in an Express response can be traced to a specific Firestore document or query result, and that the Express layer does not invent or infer values that lack direct provenance. Use strict schema validation and avoid merging data from multiple sources unless the combination is explicitly required and safely bounded.
Example 1: Direct document response with Zod validation
Instead of transforming Firestore documents, return them directly after validating against a schema that matches the known document structure. This prevents the route from adding or omitting fields.
import express from 'express';
import { getFirestore, doc, getDoc } from 'firebase-admin/firestore';
import { z } from 'zod';
const app = express();
const db = getFirestore();
const UserSchema = z.object({
uid: z.string(),
email: z.string().email(),
createdAt: z.number().int(),
});
app.get('/users/:uid', async (req, res) => {
const docRef = doc(db, 'users', req.params.uid);
const snap = await getDoc(docRef);
if (!snap.exists()) {
return res.status(404).json({ error: 'not_found' });
}
const parsed = UserSchema.safeParse(snap.data());
if (!parsed.success) {
return res.status(500).json({ error: 'invalid_data' });
}
res.json(parsed.data);
});
Example 2: Aggregation with explicit source tracking
If you must compute aggregates, include the source document IDs and avoid hallucinating missing entries. Return counts alongside references rather than synthesizing placeholder items.
import express from 'express';
import { getFirestore, collection, query, where, getDocs } from 'firebase-admin/firestore';
const app = express();
const db = getFirestore();
app.get('/users/:uid/roles', async (req, res) => {
const q = query(collection(db, 'roles'), where('uid', '==', req.params.uid));
const snap = await getDocs(q);
const roles = snap.docs.map(doc => ({ id: doc.id, ...doc.data() }));
const sourceIds = snap.docs.map(doc => doc.ref.path);
res.json({ roles, sourceIds });
});
Example 3: Avoiding client-side cursor hallucination
Do not invent intermediate documents to fill pages. Instead, use Firestore’s native cursors and return them to the client so the next request continues from a verifiable position.
import express from 'express';
import { getFirestore, collection, query, orderBy, limit, startAfter } from 'firebase-admin/firestore';
const app = express();
const db = getFirestore();
app.get('/items', async (req, res) => {
const pageSize = 20;
const lastDoc = req.query.cursor ? doc(db, 'items', req.query.cursor) : null;
let q = query(collection(db, 'items'), orderBy('__name__'), limit(pageSize));
if (lastDoc) q = query(q, startAfter(lastDoc));
const snap = await getDocs(q);
const items = snap.docs.map(doc => ({ id: doc.id, ...doc.data() }));
const nextCursor = snap.docs.length === pageSize ? snap.docs[snap.docs.length - 1].id : null;
res.json({ items, nextCursor });
});
Example 4: Safe LLM integration with source grounding
When using an LLM to generate summaries, include only validated document IDs and avoid passing raw synthesized data. If the LLM output references specific fields, verify them against the original documents before returning to the client.
import express from 'express';
import { getFirestore, doc, getDoc } from 'firebase-admin/firestore';
const app = express();
const db = getFirestore();
app.post('/summarize', async (req, res) => {
const docRef = doc(db, 'products', req.body.productId);
const snap = await getDoc(docRef);
if (!snap.exists()) {
return res.status(404).json({ error: 'not_found' });
}
const data = snap.data();
// Send only verified fields to LLM
const prompt = `Summarize this product using only the provided data: ${JSON.stringify({ name: data.name, price: data.price })}`;
// Assume getLLMSummary is a trusted internal function that does not hallucinate source IDs
const summary = await getLLMSummary(prompt);
res.json({ summary, sourceId: docRef.id });
});
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |