type: decision
status: active
timestamp: 2026-06-20
tags: [decisions, architecture, logs, observability, better-stack, cloudflare]

Logs — Better Stack Logs (aggregation) + Cloudflare Workers Tail (live)

Two-layer logs: CF Workers Tail + Better Stack' (5-min retention, 0 cost, wrangler tail). Better Stack Logs for cross-Worker aggregation + alerts + searchable retention (3 GB/mo free, same vendor as our status page + uptime monitors). Quota math: ~30 MB/mo realistic load vs 3 GB/mo cap = ~100x headroom.

Logs — Better Stack Logs + CF Workers Tail

Decision

The family runs two log layers, picked by retention horizon:

  1. Live tail (= 5 min)Cloudflare Workers Tail. Free, included with Cloudflare Workers. Streams every Worker console.log / console.error over WebSocket via wrangler tail <worker>. Used during active debugging — “what does this Worker actually log when I curl it?”
  2. Aggregation + alerts (= 30 days)Better Stack Logs. Free 3 GB/mo + 30-day retention + searchable + alertable. Same vendor as our existing status page + uptime monitors — three Better Stack products on one account.

Errors continue to flow through Sentry. Logs and errors are different observability planes — exceptions go to Sentry; structured operational logs go to Better Stack Logs.

Why

Implications

Architecture

Worker (Hono umbrella, s.oriz.in, og-card, etc.)
   +-- console.log()   ---- live ----? wrangler tail <name>   (CF Workers Tail, dev terminal)
   +-- waitUntil(      ---- aggregate ? Better Stack Logs HTTP source
         logToBetterStack({ level, msg, fields })
       )

A thin log() helper ships in @chirag127/oriz-kit/server/logging (forward reference) that:

  1. Always calls console.log(...) (so Workers Tail sees it).
  2. Always calls ctx.waitUntil(fetch(BETTER_STACK_INGEST_URL, ...)) with the same payload (so Better Stack Logs sees it) when ENABLE_BETTER_STACK_LOGS=true.
  3. Includes structured fields: { level, msg, request_id, ray, route, status, latency_ms, geo, ua, env }.
  4. Drops noisy paths via an opt-in allow-list (e.g. healthcheck routes never pushed to Better Stack).

The ENABLE_BETTER_STACK_LOGS=true toggle

Same per-site env-var pattern as Sentry. Default is false for low-traffic sites; flip to true only on Workers / sites currently being debugged or recently deployed. Combined with the 3 GB/mo cap, this prevents a runaway log loop on one Worker from burning the family-wide budget. Documented under rules/interaction/never-hit-quotas.md.

Three observability planes — distinct, not stacked

PlaneToolFree tierWhat goes here
Errors / exceptions / tracesSentry5K events/moUncaught exceptions, hand-instrumented Sentry.captureException, performance traces
Operational / structured logsBetter Stack Logs3 GB/mo, 30-day retentionlog({ level: 'info', msg: 'razorpay webhook received', payment_id }) — the kind of thing you’d tail -f on a server
Live console (active debugging)Cloudflare Workers TailUnlimited, ~5 min retentionconsole.log from a Worker; visible only while wrangler tail is connected

The earlier Axiom service entry stays in the catalog; quota alarms keep posting there per the CF Worker quota mitigation playbook. Axiom and Better Stack Logs are not redundant — Axiom is metrics-shaped event ingest with dashboards; Better Stack Logs is log-line-shaped text search + alerts. Different shapes, different destinations.

Status-page redundancy carries to logs only weakly

The two-status-page redundancy exists because the status page IS the comms channel for an outage — it must survive its own vendor going down. Logs don’t have that property: if Better Stack Logs is down, we still have Workers Tail and Sentry; we lose a few minutes of historical aggregation, not a critical comms channel. So we don’t run a redundant log sink.

What we don’t add

Cross-refs


Edit on GitHub · Back to index