Hacker News API Security Audit: 14 Findings, Five Are Firebase Console Redirects
https://hacker-news.firebaseio.com/v0/topstories.jsonAbout This API
The Hacker News API at hacker-news.firebaseio.com/v0/ is the public read-only feed of HN content — stories, comments, users, top/new/best/show/ask/job lists, and the maxitem pointer. The endpoint we scanned, /v0/topstories.json, returns a JSON array of exactly 500 numeric story IDs in current rank order. Each ID is then resolvable via /v0/item/{id}.json. The shape has been stable since the API launched in 2014.
The unusual architectural choice is that the API is served directly off a Firebase Realtime Database. Most major sites at HN's traffic scale build a custom REST or GraphQL layer in front of their datastore — the layer handles versioning, rate limiting, authentication, schema evolution, response shaping, and caching. HN doesn't have that layer. Y Combinator points hacker-news.firebaseio.com straight at the RTDB and lets Firebase's own protocol semantics be the public contract. That is rare. It is also why this audit looks different from every other public-API case study in this series.
The audience for the API is large and technical: HN-derived datasets (the BigQuery bigquery-public-data.hacker_news table is built from this feed), countless personal scrapers, third-party readers (news.hada.io, hckrnews.com, etc.), Algolia's hn.algolia.com search index, ML training corpora, and a steady stream of 'I built a thing on top of HN' side projects. The consumers are mostly developers — which means the consumer profile and the developer profile overlap unusually heavily.
Threat Model
The threat model for the read-only side of the Hacker News API is narrow. The data is already public, the responses contain no PII, no authentication state, no privileged operations. The CRITICAL 'no authentication required' finding is the API working as designed: HN intentionally exposes the feed without auth so anyone can build on top of it without an API key flow.
What an attacker can actually do
Three things, none of them dramatic. (1) Resource consumption. The 500-item topstories array plus the per-item-fetch pattern means a naive consumer that wants 'all top story bodies' issues 501 requests. A malicious consumer that wants to grief Firebase's egress can fan out item fetches in parallel and rely on the absence of rate-limit headers to stay under any back-off signal. Firebase has its own per-project quota enforcement, but consumers can't see it. (2) Cache poisoning of derived datasets. Because the JSON array doesn't carry timestamps or freshness markers, downstream consumers (BigQuery exporters, Algolia indexers) rely on polling cadence and assume monotonicity. An attacker who can briefly intercept or spoof DNS for hacker-news.firebaseio.com can serve a stale array and the downstream pipelines won't notice. This is hypothetical; HSTS-preload and certificate pinning at the Firebase edge make it expensive. (3) Origin-reflection abuse. The API reflects the request's Origin back into Access-Control-Allow-Origin. With no cookies and no auth state, this is harmless on this surface — but consumers building authenticated APIs who copy the pattern unknowingly are a different matter.
What an attacker cannot do
The five 'privileged endpoint accessible' findings (/admin, /api/admin, /api/config, /manage, /system/health) all describe paths that, in reality, are 301 redirects to console.firebase.google.com. The Firebase RTDB protocol returns the redirect to any browser-shaped GET that targets a path without the .json suffix. The actual data path — /v0/admin.json — returns HTTP 401 {"error":"Permission denied"}. There is no admin endpoint here for an attacker to reach. The scanner is reading 'redirect followed → some response served → 200 observed' as if it were 'admin endpoint accessible,' and the resulting five findings are noise.
Methodology
middleBrick ran a black-box scan against https://hacker-news.firebaseio.com/v0/topstories.json. The scan was read-only — no destructive HTTP methods, no auth headers, no probe payloads beyond the standard BFLA path-suffix sweep and the CORS-origin-reflection probe.
The fingerprinter classified the surface as plain REST. The Firebase Realtime Database protocol is REST-shaped on the wire (GET /path.json returns the JSON value at /path), so plain-REST is the right classification — but it misses that the auth boundary is the .json suffix rather than per-route middleware. That detail matters for interpreting the BFLA findings, and we cover it in detail below.
Twelve security checks ran. Fourteen findings were produced. The CORS probe sent a request with Origin: https://evil.com and observed Access-Control-Allow-Origin: https://evil.com in the response — that's the 'CORS reflects arbitrary origin' HIGH. A separate request with no Origin header observed Access-Control-Allow-Origin: * — that's the 'CORS wildcard' HIGH. Both findings describe the same Firebase response shaping, viewed from two probe angles.
The BFLA probe walked a list of common privileged-path suffixes (/admin, /api/admin, /api/config, /manage, /system/health, /debug, etc.). Each of those paths on hacker-news.firebaseio.com returns HTTP 301 Moved Permanently with a Location header pointing at console.firebase.google.com/project/firebase-hacker-news/database/.... The scanner's redirect-following logic recorded the eventual 200 on the console URL as 'endpoint accessible.' That accounts for six of the fourteen findings (five 'privileged endpoint' HIGHs plus one '/debug' HIGH categorized as inputValidation rather than bflaAuthorization).
The unpaginated-collection probe correctly identified the 500-element array. The 500 is a hard limit — Firebase RTDB caps the topstories list at 500 entries by design, and the API has no ?limit= or cursor parameter. The 'legacy/internal path pattern' MEDIUM fires on the literal /v0/ prefix. That's the API's permanent versioning prefix, not a legacy path; the scanner pattern-matches /v0 against its 'pre-v1 implies internal' heuristic, which is wrong here because there is no /v1, has never been one, and the contract has been stable for over a decade.
Results Overview
Hacker News API scored 73 out of 100 — grade C. Fourteen findings: one CRITICAL, nine HIGH, two MEDIUM, two LOW.
The honest scoring view: of fourteen findings, six are scanner artifacts produced by the Firebase 301-redirect pattern (five 'privileged endpoint accessible' HIGHs plus the '/debug accessible' HIGH). One — the 'legacy /v0/ path pattern' MEDIUM — is a misclassification of HN's permanent versioning prefix. That leaves seven findings that meaningfully describe the API surface:
- CRITICAL — no authentication required (intentional, structural to a public read-only feed)
- HIGH — CORS wildcard
Access-Control-Allow-Origin: *on requests with no Origin - HIGH — CORS reflects arbitrary
Originheader - HIGH — no rate-limit headers (X-RateLimit-*, Retry-After absent)
- MEDIUM — large unpaginated collection (500 items, hard-capped, no cursor)
- LOW — missing security headers (X-Content-Type-Options, X-Frame-Options)
- LOW — DELETE/PUT/PATCH advertised via OPTIONS (Firebase RTDB protocol surface; the .json paths reject them with 401 'Permission denied' for unauthenticated callers)
For comparison with other public APIs in this series:
- SWAPI: 4 findings (A, 91)
- Rick and Morty API: 9 findings (B, 78)
- FakeStoreAPI: 10 findings (C, 75)
- HTTPBin: 11 findings (B, 82)
- JSONPlaceholder: 11 findings (C, 73)
- Random User Generator: 12 findings (B, 79)
- PokéAPI: 12 findings (B, 76)
- DummyJSON: 13 findings (B, 75)
- Hacker News API: 14 findings (C, 73)
- ReqRes: 17 findings (C, 73)
The raw 14-finding count is the second-highest in our series, but the de-noised count of 7 actionable observations is mid-pack. The high count is mostly an artifact of how Firebase's protocol surface confuses pattern-based scanners.
Detailed Findings
API accessible without authentication
The endpoint returned 200 without any authentication credentials.
Implement authentication (API key, OAuth 2.0, or JWT) for all API endpoints.
Privileged endpoint accessible: /admin
/admin returned 200 without authentication. This may expose admin functionality.
Restrict access to admin/management endpoints. Implement RBAC with proper role checks.
Privileged endpoint accessible: /api/admin
/api/admin returned 200 without authentication. This may expose admin functionality.
Restrict access to admin/management endpoints. Implement RBAC with proper role checks.
Privileged endpoint accessible: /api/config
/api/config returned 200 without authentication. This may expose admin functionality.
Restrict access to admin/management endpoints. Implement RBAC with proper role checks.
Privileged endpoint accessible: /manage
/manage returned 200 without authentication. This may expose admin functionality.
Restrict access to admin/management endpoints. Implement RBAC with proper role checks.
Privileged endpoint accessible: /system/health
/system/health returned 200 without authentication. This may expose admin functionality.
Restrict access to admin/management endpoints. Implement RBAC with proper role checks.
CORS allows all origins (wildcard *)
Access-Control-Allow-Origin is set to *, allowing any website to make requests.
Restrict CORS to specific trusted origins. Avoid wildcard in production.
CORS reflects arbitrary origin
The server reflects the Origin header value in Access-Control-Allow-Origin, allowing any website to make cross-origin requests.
Validate the Origin header against a strict allowlist of trusted domains.
Debug endpoint accessible: /debug
A debug/diagnostic endpoint is publicly accessible — may leak internal state.
Disable debug endpoints in production. Restrict access to internal networks only.
Missing rate limiting headers
Response contains no X-RateLimit-* or Retry-After headers. Without rate limiting, the API is vulnerable to resource exhaustion attacks (DoS, brute force, abuse).
Implement rate limiting (token bucket, sliding window) and return X-RateLimit-Limit, X-RateLimit-Remaining, and Retry-After headers.
Large unpaginated collection in response
Response is an array with 500 items without pagination indicators.
Implement cursor-based or offset pagination for collection endpoints.
Legacy/internal path pattern detected
The URL path /v0/topstories.json matches legacy, internal, or test patterns.
Remove or restrict access to legacy, beta, internal, and test API endpoints in production.
Missing security headers (2/4)
Missing: X-Content-Type-Options — MIME sniffing; X-Frame-Options — clickjacking.
Add the following headers to all API responses: x-content-type-options, x-frame-options.
Dangerous HTTP methods allowed: DELETE, PUT, PATCH
The server advertises support for methods that can modify or delete resources.
Only expose HTTP methods that are actually needed. Disable TRACE, and restrict DELETE/PUT/PATCH.
Attacker Perspective
An attacker has very little to do against the read-only HN API. The data is already public. The interesting attacker question is not 'how do I exfiltrate something' but 'what can I learn about Firebase-as-public-API patterns that I can apply to other targets.'
The real lesson: Firebase databases that aren't HN
Hacker News configured its database security rules to permit unauthenticated reads on the /v0/ subtree only. Everything else returns 401. The five 'privileged endpoint' scanner findings are actually telling us — accidentally, by virtue of the 301-redirect noise — that hacker-news.firebaseio.com is a public-address Firebase project, and the fact that the scan didn't pick up data leaks at /admin.json or /config.json is the result of YC's security rules being correctly configured, not the result of those paths not existing. The attacker reading this audit learns that 'is this Firebase project locked down?' is a one-curl-away question for any *.firebaseio.com domain: curl -i https://target.firebaseio.com/.json. If the response is the entire database, the rules are misconfigured. There is a long history of Firebase-misconfiguration breaches dating to roughly 2018; they happen because operators forget that the default rules are permissive in dev mode and forget to flip them before going live.
Resource consumption against the consumer pattern
The bigger practical attack vector against systems built on top of the HN API is not against HN itself but against poorly-built consumers. A consumer that fetches /v0/topstories.json and then issues 500 parallel /v0/item/{id}.json requests — which is the obvious naive implementation — has a 500-fanout per poll. If the consumer polls every minute, that's 30,000 outbound requests per hour from a single source. An attacker who controls the source IP allocations of a downstream pipeline can amplify this into a noisy-neighbor problem, or, if the consumer has the items pipeline running on a free-tier serverless platform with strict outbound-request quotas, can DoS the consumer's quota by triggering polls.
The CORS-reflection pattern
The reflection itself is harmless on HN — no cookies, no auth state, nothing in the response that an attacker on https://evil.com can read that they couldn't read with a normal cross-origin fetch followed by their own server-side proxy. But developers reading HN's response headers and inferring 'this is how Firebase configures CORS, so this is fine' propagate the reflection pattern into authenticated Firebase apps where it absolutely is not fine.
Analysis
The scan's most interesting result is the cluster of five 'privileged endpoint accessible' HIGH findings. They look terrifying on a dashboard. They are not, in fact, real. Walking through one:
$ curl -I https://hacker-news.firebaseio.com/v0/admin
HTTP/1.1 301 Moved Permanently
Server: nginx
Location: https://console.firebase.google.com/project/firebase-hacker-news/database/hacker-news/data/v0/admin
Access-Control-Allow-Origin: *
Strict-Transport-Security: max-age=31556926; includeSubDomains; preloadThe 301 redirects to the Firebase Console UI. The console is a Google product, not an HN admin endpoint, and accessing it requires a Google login with permission on the firebase-hacker-news project. An attacker following that redirect lands on a Google login page. There is no ‘exposed admin functionality’ here.
The actual data-access pattern on Firebase RTDB requires the .json suffix:
$ curl -i https://hacker-news.firebaseio.com/v0/admin.json
HTTP/1.1 401 Unauthorized
Content-Type: application/json; charset=utf-8
{
"error" : "Permission denied"
}The auth boundary on Firebase RTDB is the protocol-suffix-plus-rules combination: paths without .json redirect to the console; paths with .json are evaluated against the project’s security rules and either return data or return 401. HN’s rules expose /v0/ publicly and lock everything else, which is correct and tight.
The two CORS findings deserve a second look together. With no Origin header on the request:
$ curl -I https://hacker-news.firebaseio.com/v0/topstories.json
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *With Origin: https://evil.com:
$ curl -I -H "Origin: https://evil.com" https://hacker-news.firebaseio.com/v0/topstories.json
HTTP/1.1 200 OK
Access-Control-Allow-Origin: https://evil.comFirebase's CORS shaping is to reflect the request origin if one is present and fall back to wildcard if not. On a no-cookie, no-auth, public-data endpoint this is permissible — the browser’s same-origin policy is not protecting anything that wasn’t already public. The pattern is genuinely dangerous when copied to authenticated Firebase endpoints, which is why both scanner findings fire correctly even though neither is exploitable here.
The unpaginated-collection MEDIUM is real and structural. The topstories endpoint returns exactly 500 IDs with no ?limit=, no ?after=, no cursor. A consumer that wants the top 100 stories has to fetch all 500 and slice. A consumer that wants stories 501-1000 cannot — they don’t exist in this endpoint. That’s a documented HN behavior (the 500 cap is in the API README on GitHub at HackerNews/API); the scanner's MEDIUM is technically correct but not actionable on this specific feed.
Industry Context
The single most defining characteristic of the Hacker News API is that it is served directly off Firebase Realtime Database with no custom layer in front. This is the architectural choice that drives most of the unusual scanner output, and it is worth thinking about because the trade-offs are real and most APIs at HN's scale make the opposite choice.
What you get by exposing Firebase directly
You get listen-for-changes built into the protocol — Firebase clients can subscribe to /v0/maxitem and get pushed updates when a new item is created, without polling. You get global edge replication for free. You get a maintenance burden of approximately zero — YC does not run a service in front of this. You get the hardness of Google's CDN as your DDoS posture. The aggregate operational cost is plausibly in the low single digits of dollars per month, even at HN's traffic.
What you give up
You give up control of the response shape. Firebase responses don't carry rate-limit headers, don't carry pagination cursors, don't carry freshness timestamps, don't carry per-resource ETags in the way a custom layer would shape them. You give up the ability to reject DELETE/PUT/PATCH at the protocol layer — the OPTIONS response advertises them because the underlying RTDB wire protocol uses those methods for authenticated writes, and the .json paths return 401 when called unauthenticated. You give up the ability to evolve the schema cheaply: there is no migration path from /v0/ to /v1/ without losing every consumer that hard-coded the existing endpoints. You give up custom URL versioning, custom error formats, and custom auth flows. The trade-off is ‘developer-time-saved-by-not-building-a-layer’ vs. ‘flexibility-lost-by-not-having-one,’ and HN's choice has aged well precisely because the API contract is genuinely simple enough not to need any of the things a custom layer would provide.
What this means for consumers
If you build on the HN API, plan your consumer code around the absence of pagination, the absence of rate-limit signals, and the 500-item cap. Cache the topstories array — it doesn’t change quickly. Use conditional GETs (If-None-Match works, Firebase emits ETags). For the per-item fan-out, throttle yourself client-side rather than waiting for a 429 from Firebase, because the 429s, when they arrive, do not carry Retry-After.
OWASP API Top 10 mapping
API1 (no auth) is the structural CRITICAL. API3 (BOLA) is not represented because the per-item endpoint accepts numeric IDs against a public catalog where every ID is intentionally world-readable. API4 (resource consumption) covers the missing rate-limit headers and the large unpaginated array. API8 (security misconfiguration) covers the CORS wildcard and origin-reflection findings, and would cover the 'privileged endpoint' findings if they were real. API9 (improper inventory management) is what the legacy-path heuristic misfires against. The remaining categories are not represented.
Remediation Guide
Consumer pattern: cache the topstories array, throttle the per-item fan-out
The naive consumer fetches /v0/topstories.json and then issues 500 parallel /v0/item/{id}.json requests. The disciplined consumer caches topstories for 30-60 seconds and batches item fetches with a concurrency cap.
import asyncio, aiohttp, time
class HNClient:
def __init__(self, ttl=60, concurrency=8):
self._ttl = ttl
self._top = None
self._top_at = 0
self._sem = asyncio.Semaphore(concurrency)
async def topstories(self, session):
if self._top and time.time() - self._top_at < self._ttl:
return self._top
async with session.get('https://hacker-news.firebaseio.com/v0/topstories.json') as r:
self._top = await r.json()
self._top_at = time.time()
return self._top
async def item(self, session, id):
async with self._sem:
async with session.get(f'https://hacker-news.firebaseio.com/v0/item/{id}.json') as r:
return await r.json()
async def top_n(self, n=30):
async with aiohttp.ClientSession() as s:
ids = (await self.topstories(s))[:n]
return await asyncio.gather(*[self.item(s, i) for i in ids]) Consumer pattern: subscribe to /v0/maxitem instead of polling topstories
If you want push-based updates rather than polling, use the Firebase event-source endpoint on /v0/maxitem.json. Firebase emits server-sent events when the value changes.
// Node.js using the EventSource API (or any SSE client)
import EventSource from 'eventsource';
const es = new EventSource(
'https://hacker-news.firebaseio.com/v0/maxitem.json',
{ headers: { Accept: 'text/event-stream' } }
);
es.addEventListener('put', (e) => {
const { data } = JSON.parse(e.data);
// data is the new max item id; fetch /v0/item/{data}.json to get the body
console.log('New item:', data);
});
es.addEventListener('keep-alive', () => { /* heartbeat */ }); Consumer pattern: respect the unstated rate limits with conditional GETs
Firebase emits ETags on JSON responses. Use If-None-Match to skip the response body when the resource hasn't changed.
// Node 20+ with built-in fetch
let etag = null;
let cached = null;
async function getTopStories() {
const headers = etag ? { 'If-None-Match': etag } : {};
const r = await fetch('https://hacker-news.firebaseio.com/v0/topstories.json', { headers });
if (r.status === 304) return cached;
etag = r.headers.get('ETag');
cached = await r.json();
return cached;
} Operator pattern: lock down your own *.firebaseio.com security rules
The real defense for any Firebase RTDB deployment that isn't HN's. Default-deny in your rules and explicitly open only the read paths that should be public. Verify with `curl -i https://YOUR-PROJECT.firebaseio.com/.json` — if the response is your entire database, the rules are wrong.
// database.rules.json
{
"rules": {
".read": false,
".write": false,
"public": {
".read": true
},
"users": {
"$uid": {
".read": "$uid === auth.uid",
".write": "$uid === auth.uid"
}
}
}
} Maintainer-side hygiene (optional): document the protocol-suffix auth boundary
The README at github.com/HackerNews/API could call out that the /v0 paths require the .json suffix and that paths without it 301 to the Firebase Console. This would save scanner-driven false positives on every audit run by every consumer.
## Auth boundary on this API
This API is served directly off Firebase Realtime Database. Two things follow:
1. **Data paths require the `.json` suffix.** `/v0/topstories.json` works; `/v0/topstories` returns a 301 redirect to the Firebase Console UI (which then asks you to log in to Google). The redirect is not an admin endpoint — it is Firebase's default UI affordance for browser-shaped requests.
2. **The auth boundary is the project's security rules,** not per-route middleware. Public reads are allowed only under `/v0/`. Writes and reads under any other prefix return `401 Permission denied`. Defense in Depth
The defense recommendations here are unusually short, because most of the scanner findings are either correct-by-design or scanner noise. Walking the realistic ones:
1. The 'no auth' CRITICAL. No fix. This is the explicit purpose of the public API. Document the choice clearly so consumers understand the contract.
2. The CORS-reflection pair. Firebase's default behavior. The fix on a public read-only feed is to leave it alone — there's nothing for an attacker to read cross-origin that they can't read by other means. The fix on any other Firebase project that handles authenticated user data is to configure the security rules tightly and, where possible, front the database with a Cloud Function that emits a hardcoded Access-Control-Allow-Origin matching a strict allowlist. Operators who copy HN's CORS pattern into authenticated apps are the ones who get burned.
3. The rate-limit-header HIGH. Firebase RTDB doesn't emit them by default. A custom layer in front (Cloud Function, Cloudflare Worker, similar) can wrap the responses and add them. For the public HN API specifically, the maintenance cost of doing this exceeds the consumer benefit. For consumers, the practical defense is to throttle on your side: cache topstories.json for 30-60 seconds, batch item fetches, and back off on any 429 you see.
4. The unpaginated-500-array MEDIUM. The 500-cap is documented and load-bearing. Adding cursoring would be a breaking change. The defense here is consumer-side: implement pagination on top of the array (slice it client-side) and cache aggressively.
5. The five 'privileged endpoint' HIGHs. No defense needed. The endpoints don't exist as the scanner described them. The takeaway for the maintainer is that *.firebaseio.com domains will produce these findings on every BFLA-shaped scan, and the responsible-disclosure response is to point at the security rules configuration rather than treat the findings as actionable. The takeaway for the scanner author (us, in part) is that the BFLA probe should special-case the Firebase 301-redirect-to-console pattern.
6. The two LOW findings. Adding X-Content-Type-Options: nosniff and X-Frame-Options: DENY at the Firebase edge is a Google-side change, not an HN-side change. They're hygiene findings on a JSON-only API where neither header materially changes the security posture. Genuinely optional.
Conclusion
Hacker News API scored 73/100 with fourteen findings, but the more honest scoring view is seven actionable observations on a public read-only feed that has been operating successfully on top of Firebase Realtime Database for over a decade. The architectural choice — to skip the custom REST layer entirely and let Firebase's wire protocol be the public contract — is the choice that drives the scanner's misclassifications, and is also the choice that has kept the API maintenance burden so low that YC has been able to run it as a side concern for ten-plus years.
The actionable lessons in this case study are not for the API maintainer. They are for two other audiences. For consumers building on top of the HN API: plan around the 500-item cap, the absence of pagination cursors, the absence of rate-limit headers, and the unusual CORS shaping; cache aggressively, throttle on your side, and don't copy Firebase's CORS pattern into authenticated projects. For operators of any other *.firebaseio.com deployment: read your security rules carefully. The security boundary on Firebase RTDB is the rules configuration, not the URL surface. The history of Firebase-database-leaks comes down to operators forgetting to flip the rules out of dev mode. HN's deployment is a worked example of what tight rules look like; most public Firebase incidents are a worked example of what they look like missing.
The single largest finding from this audit, which is barely visible in the raw scanner output, is positive: hacker-news.firebaseio.com is a Firebase project that does not leak. Five HIGH 'privileged endpoint' findings turn out to be five 401 'Permission denied' responses on the actual data plane. That is what correctly-scoped Firebase security rules look like.