How /dashboard loads in 1.5s and processes 95% fewer bytes — performance architecture
A 3.5-second LCP would have been an immediate disqualifier in head-to-head testing against Triple Whale, Datafast, or Northbeam. A dashboard that bills $99/month per active workspace just to render the KPI strip would be a worse one. On 2026-05-06 we shipped both fixes in parallel: the four-cause taxonomy that took /dashboard from 3.5s LCP to ~1.5s, and the four-layer optimization stack that cuts our analytics warehouse bytes-processed by ~95% per workspace. This page documents both — together they're the complete playbook for shipping real-time-feel analytics on a columnar OLAP database at customer scale.
Why this matters
A dashboard that takes 3.5 seconds to render is one the operator opens once a day, glances at, and closes. A dashboard that loads in 1.5 seconds is one they keep open in a tab and refer to throughout the workday. The difference is product retention, not just engineering pride — and the path from 3.5s to 1.5s is mostly mechanical once you know where to look.
But wall-clock LCP is only half the story. The other half is cost. A real-time-feel analytics dashboard fanning out 144 queries per day against a columnar OLAP database, at 4.58 MiB scanned per query, burns 1.5 GiB of bytes-processed per workspace per day. At 30 active workspaces you've blown past our analytics warehouse's entry-tier budget; at 100 you're paying a four-figure monthly bill that the customer's $9-$199 subscription cannot economically support. The right answer is not "raise prices" — it's the four-layer optimization stack documented below, which cuts bytes-processed by ~95% without sacrificing freshness.
The trap is optimizing the wrong layer. Bundle size matters until the network is fast enough that it doesn't, and then query waterfalls dominate. Query waterfalls matter until you single-flight them, and then server cold-start dominates. Server cold-start matters until you cache cross-instance, and then per-query bytes-scanned dominates. Most "we sped up the dashboard" blog posts cover one of these layers and miss the others, leaving the LCP needle stuck and the bill rising. The taxonomies below make every layer explicit so a perf project addresses every dominant cost, not just the most visible one.
The 4-cause LCP taxonomy (wall-clock)
Every slow dashboard we have audited fits into one of four cause families. The numbers below are the actual measured costs on /dashboard before and after the 2026-05-06 GL#402 fix wave; your numbers will differ but the structure is portable.
| Cause | Before | After | Share of LCP | Fix |
|---|---|---|---|---|
| Bundle size | 1.2 MB main chunk | 380 KB main chunk | ~35% | Vite manualChunks + lazy wrappers |
| Query waterfall | 31 client-side fetches | 1 single-flight snapshot | ~25% | /api/v1/dashboard/snapshot endpoint |
| Server cold-start | 870ms TTFB on first hit | ~80ms (cache hit) / ~280ms (miss) | ~15% | our job queue stale-while-revalidate cache |
| Progressive render | All-or-nothing render | Above-fold first, below-fold lazy | ~15% | LazyOnView + DashboardSkeleton |
The remaining ~10% is image decoding + font swap + initial React reconciliation — small enough to defer until the four bigger causes are addressed.
The bytes-processed problem (cost dimension)
After GL#411 stabilized the timeout surface, our analytics warehouse usage audit on 2026-05-06 revealed the cost dimension we'd been deferring. One Vitatree-class workspace was burning 1.5 GiB of our analytics warehouse bytes-processed per day, with the live the revenue rollup pipe alone responsible for 660 MiB (44% of the total). At 144 queries per day · 4.58 MiB scanned per query, the linear projection was ~45 GiB per month per workspace — 45% of our analytics warehouse's entry-tier budget 100 GB/mo budget. Ship 30 workspaces and you're over budget; ship 100 and you're on the Pro tier paying $99 per active workspace per month just to render the KPI strip.
The previous GL#402 perf taxonomy fixed wall-clock LCP but left the cost dimension untouched. The snapshot endpoint still hit our analytics warehouse directly on every cache miss, the single-process Map cache evaporated on every our infrastructure deploy, and the frontend polled aggressively on window-focus + reconnect. Bundle splitting and progressive render do nothing for bytes-scanned — that's a server-side concern only fixable by aggressive cross-instance caching, database-layer pre-aggregation, and frontend polling discipline. Hence the 4-layer stack.
The 4-layer optimization stack (GL#412)
The four layers below compose — they don't substitute for each other. Each layer addresses a distinct dimension of the cost surface, and the multiplicative effect is what turns 1.5 GiB/day into ~75 MiB/day per workspace.
| Layer | Mechanism | Bytes saved | File |
|---|---|---|---|
| 1. an in-memory cache with single-flight | 50 concurrent identical filter-shapes collapse to 1 upstream call | ~70% (cross-instance cache hits) | server/lib/cache/swrCache.ts |
| 2. our analytics warehouse Materialized View | the revenue rollup pre-aggregates daily rollup; 4.58 MiB → KB per query |
~99% per-query bytes | an internal report |
| 3. FE polling discipline | No refetchOnWindowFocus, no refetchInterval, manual Refresh button |
~92% query count (144/day → 12/day) | client/src/lib/queryClient.ts |
| 4. our job queue prewarm worker | 5-min cron primes SWR cache for active workspaces | 0 (cost-neutral; latency win) | server/queues/dashboardPrewarm.ts |
| 5. Perf canary | p50/p95 + bytes-processed regression alarm | 0 (regression detection) | scripts/perf-canary.ts |
Combined effect (projected): 1.5 GiB/day → ~75 MiB/day per workspace (~95% reduction in bytes processed); 144 queries/day → ~12 queries/day per workspace (~92% reduction in query count). Per-workspace monthly warehouse throughput drops from 45 GiB to ~2.3 GiB — comfortably inside our throughput budget even at 30+ active workspaces.
Cause 1 (LCP) — Bundle size
Symptom. The browser spends 1.2 seconds parsing and evaluating JavaScript before any pixels render. dist/public/assets/index-*.js is the file to look at; if it crosses 500 KB on a complex SPA you are paying it on every cold load. The fix is not "ship less code" — you can't unship features — it is "ship the right code first, the rest later."
Fix. Vite's build.rollupOptions.output.manualChunks lets you split the bundle into stable cacheable groups: vendor (React, framer-motion, recharts), shadcn/ui primitives, and per-route chunks loaded on demand. Combined with React.lazy wrappers around every route in App.tsx, the initial download drops from "everything the user might ever click" to "just the page they are on right now."
Canary. scripts/check-bundle-size.ts runs in postbuild and asserts the main index-*.js chunk stays under 500 KB (warning) / 800 KB (build-failing). Adjust thresholds when shipping a deliberate feature; the goal is "no surprise regressions," not "stay tiny forever."
Cause 2 (LCP) — Query waterfall
Symptom. Open DevTools Network tab, load /dashboard, count the requests fired before LCP. If the answer is "more than five," you have a waterfall. The pre-fix /dashboard fired 31 requests — one per card per analytics pipeline — each one round-tripping to the server and waiting on the previous one if it depended on shared filters. Even at 50ms RTT, 31 sequential or near-sequential requests is a 1-second hit before any data renders.
Fix. Single-flight on the server. One endpoint, GET /api/v1/dashboard/snapshot, takes the active filters once and fans out to every analytics pipeline in parallel server-side, then returns one well-typed JSON payload covering every card on the page. The browser fires one fetch, gets one response, and renders. The pipe fan-out happens at server-to-our analytics warehouse (low-latency, persistent connection pool) instead of browser-to-server (high-latency, fresh TCP).
The trade-off is one slow analytics pipeline slows the whole snapshot. We mitigate this with per-pipe timeouts (FAST_PIPE_TIMEOUT_MS=5s, SLOW_PIPE_TIMEOUT_MS=30s per GL#411) and source-additive fallback — a missing pipe yields an empty card, not a failed snapshot.
Cause 3 (LCP) — Server cold-start
Symptom. First visit after a deploy or after the cache has aged out: 870ms TTFB. Subsequent visits within the cache window: 80ms TTFB. The ~10x gap is the difference between our analytics warehouse query latency and our job queue cache hit latency.
Fix (v1). A stale-while-revalidate cache wrapper. The first request in a cold window fetches from our analytics warehouse, caches the result for 300 seconds (fresh), and stores a stale-tier copy for up to 3600 seconds. Within the fresh window, every browser hit is a cache lookup. Outside the fresh window, the request returns the stale copy immediately and triggers a background refetch.
Fix (v2 — Layer 1 of the GL#412 stack). The v1 cache was a single-process Map; it evaporated on every deploy and didn't dedup across instances. The v2 implementation in server/lib/cache/swrCache.ts moves the cache to our shared managed cache layer (a single shared connection per GL#67) and adds single-flight de-dup: 50 concurrent identical filter-shapes collapse to 1 upstream call via a cache lock + pub/sub fan-out. See the next section for the full Layer 1 details.
Cause 4 (LCP) — Progressive render
Symptom. The browser has the JavaScript, the data has arrived, but the page still does not paint because React is busy reconciling the entire dashboard tree at once. Below-fold cards (charts the user has to scroll to see) are paying the same rendering cost as above-fold ones, even though the user does not see them yet.
Fix. LazyOnView: a small wrapper that uses IntersectionObserver to render its children only when they enter the viewport (with a configurable rootMargin so they hydrate just before they scroll into view). Above-fold cards render immediately; below-fold cards defer until the user is about to see them.
Layer 1 (cost) — an in-memory cache + single-flight
Why. A per-process in-memory cache is fine for one Node instance with steady traffic. Admaxxer runs on our infrastructure with multi-instance rolling deploys, so the in-memory cache evaporated on every deploy and didn't share state across instances. Worse: under burst load (50 concurrent tabs from the same workspace fetching the same filter-shape), every tab paid a full upstream call because there was no de-dup mechanism at the cache layer. The an in-memory cache with single-flight wrapper fixes both.
How. server/lib/cache/swrCache.ts exports withSwrCache<T>(cacheKey, fetcher, { freshSeconds, staleSeconds, singleFlight: true }). Cache key is <query-name>:<stableHash(params)> — stable-hash so semantically-identical filters collapse. Fresh window: 300s (every browser hit is a cache lookup). Stale window: 3600s (returns stale immediately, triggers background refetch). Single-flight: a cache lock with a 30s TTL claims the refresh; concurrent callers wait on an in-memory pub/sub channel for the result instead of each issuing their own upstream call. The analytics cache wrapper is the consumer — every existing report call upgrades transparently.
Don't. Don't instantiate a second cache client for the SWR layer — the shared cache connection is the only entry point per GL#67. A second client doubles the connection-pool contention and breaks pub/sub fan-out for single-flight.
Layer 2 (cost) — materialized-view rollups in the warehouse
Why. Layer 1 caches the heavy query, but cold cache misses still scan the full raw event stream. The revenue rollup scans the on-site event stream at ~4.58 MiB per query — multiply by every cache miss across every workspace and the bytes-processed bill stays bad. The only fix is to scan less data per query, which means pre-aggregating the heavy daily rollup at the warehouse layer.
How. The revenue rollup is backed by a materialized view keyed by (workspace_id, day, currency) — the warehouse's incremental materialized-view engine updates it as new events land. The live query reads the materialized view (a few KB per workspace per day) for completed days and falls through to the raw event stream only for today's partial day. End-to-end: 4.58 MiB scans become KB scans for the 95%+ of the query that's historical.
Critical. Analytics query definitions are versioned alongside the rest of the codebase and shipped through a controlled release step (GL#228), so a change to the rollup logic is published to the warehouse query layer before it goes live — otherwise the materialized view would be inert in production. A postbuild canary guards this by warning when a release touches the analytics query definitions.
Layer 3 (cost) — FE polling discipline
Why. React Query's defaults are aggressive for a reason — for chat apps and live dashboards where freshness is the product, refetching on window focus and reconnect is correct. For a DTC analytics dashboard where the operator is glancing at trends, those defaults turn into accidental cost: every Cmd-Tab back to the dashboard tab fires a fresh fetch. Multiplied across 144 polls per day per workspace, this was the dominant query-count driver in the audit.
How. client/src/lib/queryClient.ts sets refetchOnWindowFocus: false, refetchOnReconnect: false, and refetchInterval: false as the global default. Heavy endpoints (/dashboard/snapshot, /analytics/summary) get staleTime: 5 * 60 * 1000 so React Query treats the cache as authoritative for 5 minutes. The new <RefreshButton /> component (client/src/components/ui/RefreshButton.tsx) gives the user a manual lever — framer-motion spin animation, optimistic "Refreshed 0s ago" microcopy, throttled to 1 click per 10 seconds. The user is in control of freshness; the system isn't burning bytes on autopilot.
Layer 4 (cost) — Pre-warming worker
Why. Layer 1 amortizes repeat hits but not cold-tab opens. If the operator opens /dashboard at 9 AM and the cache aged out at 4 AM, they pay the full cold-cache TTFB. The fix is to keep the cache warm for active workspaces — a background worker that hits the snapshot endpoint server-side on a 5-minute cron, so the user's first paint is always a cache hit.
How. server/queues/dashboardPrewarm.ts registers a job-queue scheduled job (cron */5 * * * *). On each tick, it reads recent chat-session and pixel-event activity for workspaces active in the last 24 hours, then hits /api/v1/dashboard/snapshot server-side for each. The worker uses the shared cache connection (GL#67) — never instantiates a second client. The call is idempotent: if the SWR cache is already fresh, the prewarm is a no-op. Cost-neutral for cache layer 1 (the prewarm hit IS the cache fill that would have happened on the next user request anyway), latency win for users.
Layer 5 (cost) — Perf canary
Why. A perf optimization stack is not a one-time ship — it's a budget that has to be defended on every PR. Without an automated canary, Layer 2's MV could silently fall behind a schema change, Layer 1's cache hit rate could collapse from a poorly-chosen cache key, or Layer 3's polling defaults could regress when someone adds a feature with their own React Query config. The canary catches these before they hit production.
How. scripts/perf-canary.ts measures p50/p95 of /api/v1/dashboard/snapshot against a fixture workspace, queries our analytics warehouse workspace usage API for bytes-processed in the last 24 hours, and writes a perf-canary.json artifact. Postbuild runs it in non-blocking warning mode (catches obvious regressions without breaking the deploy hot path). Nightly our infrastructure cron runs it in hard-fail mode: p95 > 3s OR bytes > 200 MiB per fixture workspace → exit 1, alert via the existing Slack webhook.
Best-practice references
The patterns above are not novel — they are the consensus playbook of the teams who ship the fastest data dashboards on the public internet. We borrowed shape from each and adapted to Admaxxer's stack:
- Stripe Dashboard — single-flight server endpoint per route + SSE for real-time updates. Inspired the
/api/v1/dashboard/snapshotshape and the per-route freshness budget. - Linear's sync engine — aggressive bundle splitting + IntersectionObserver-driven hydration + cross-instance cache via our job queue. Inspired Layer 1 and the LazyOnView pattern.
- Materialized Views — the documented pattern for pre-aggregating heavy rollups at the columnar-DB layer. Layer 2 is a textbook implementation of this pattern.
- Go's singleflight package — the canonical single-flight de-dup primitive. Layer 1's a cache lock + pub/sub fan-out is the multi-instance equivalent.
- SWR / stale-while-revalidate — the freshness-with-fallback pattern. Layer 1's job-queue-backed cache wrapper is the server-side equivalent. Our managed job queue (durable cron + workers) and shared cache layer back Layers 1 and 4 and the global rate limiter, with a single shared connection per GL#67.
- Vercel build output — per-route chunking semantics. Inspired the route-disjoint chunk strategy in
vite.config.ts. - web.dev Core Web Vitals — LCP / FCP / TBT methodology. Used to score before/after.
- Datafast (datafa.st) — minimal-fetch dashboard SSR with progressive enhancement. Validated that "one snapshot endpoint" is a viable pattern for DTC analytics, not just trading-app dashboards.
How to apply this playbook when a dashboard is slow OR expensive
Run the diagnostic in order. Most "perf fixes" fail because they optimize one cause when another dominates — the LCP needle barely moves, the bytes-processed bill barely shrinks, and the team concludes "the dashboard is just slow / expensive."
- Measure both dimensions. Lighthouse for LCP / TTFB (wall-clock); our analytics warehouse (or your OLAP DB) workspace usage API for bytes-processed (cost). Record both before-states.
- Wall-clock layer 1 (bundle). Inspect
dist/public/assets/index-*.js. Runnpx vite-bundle-visualizer. If >500 KB, split it before doing anything else. - Wall-clock layer 2 (waterfall). Network tab, count requests before LCP. If >5, single-flight on the server with one snapshot endpoint.
- Cost layer 1 (cross-instance cache). Wrap every OLAP call in
withSwrCache(or your equivalent). Single-flight is non-negotiable — without it, burst load multiplies upstream cost linearly. - Cost layer 2 (database pre-aggregation). Identify the whale query (the one whose bytes-scanned dominates the audit). Materialize its daily rollup. Read MV first, raw fallback only for today's partial day.
- Cost layer 3 (FE polling discipline). Audit your React Query (or SWR / TanStack Query / etc.) defaults. Disable refetch-on-focus / reconnect / interval for heavy endpoints. Add an explicit Refresh button.
- Cost layer 4 (prewarm). Stand up a 5-minute cron worker that primes the cache for active workspaces. Cost-neutral, latency win.
- Re-measure both dimensions. Lighthouse + workspace usage API. The numbers should improve at every layer; if one layer's fix did not move the needle, you misdiagnosed and another layer dominated.
What not to do
- Don't retry on timeout. Per GL#411, timeouts under fan-out load are not transient — the SWR cache + single-flight is the amortization mechanism, not retry loops. A retry doubles wall-clock without fixing the cause.
- Don't instantiate a second cache client for the SWR layer. The shared cache connection is the only entry point per GL#67. A second client doubles connection-pool contention and breaks pub/sub fan-out for single-flight.
- Don't skip the analytics-query release step after editing a rollup definition. A standard code deploy rebuilds the app but does not publish updated query SQL to the warehouse query layer (GL#228). Without that step, the materialized view is inert in production.
- Don't add a global iframe-allow CSP. Per GL#411, scope the override to the embed routes; keep the global default
frame-ancestors 'none'. - Don't ship Layer 1 without Layer 2. Caching the heavy query is fine for repeat hits, but cold cache misses still scan the full raw table and your tail-latency stays bad.
- Don't cache real-time data. Live visitor counts and awaiting-event toasts must hit the upstream every 10 seconds — the freshness is the product. Cache aggregate trends, not real-time counts.
- Don't trust single-process Map caches as cross-instance state. our infrastructure rolling deploys evaporate per-pod state. Only our job queue is the cross-instance source of truth.
- Don't ship a perf fix without measuring before AND after. "It feels faster / cheaper" is not a perf fix — it is a vibe. Lighthouse + workspace-usage API readings before, both readings after, both numbers in the commit message.
Canaries (defending the budget)
Two canaries defend this stack on every commit:
- Bundle-size canary (
scripts/check-bundle-size.ts) — postbuild guard for Cause 1 of the LCP taxonomy. Asserts the mainindex-*.jschunk stays under 500 KB (warning) / 800 KB (build-failing). - Perf canary (
scripts/perf-canary.ts, GL#412 Layer 5) — measures p50/p95 of/api/v1/dashboard/snapshotagainst a fixture workspace + our analytics warehouse bytes-processed in the last 24 hours. Postbuild: warning mode. Nightly our infrastructure cron: hard-fail mode (p95 > 3sORbytes > 200 MiBper fixture workspace →exit 1, Slack alert).
Lighthouse expectation. Manual Lighthouse runs against /dashboard on a fast 4G connection should show LCP < 2.5 seconds. We considered shipping an automated Lighthouse postbuild canary but rejected it: Chromium is heavy to install in CI, and a full Lighthouse run takes 60-90 seconds — too slow for the build hot path. The bundle-size + perf-canary pair is the cheap continuous protection; Lighthouse is a manual gate before perf-sensitive ships.
Related documentation
- /dashboard — the dashboard surface this perf work targets (signed-in only).
- /documentation/dashboard/analytics — per-card pipe mapping for /dashboard. The snapshot endpoint described above fans out to every pipe listed there.
- /documentation/dashboard-customization — how the same dashboard supports drag-and-drop layout editing without paying a perf cost (hidden tiles still ingest, just don't render).
- /documentation/architecture/analytics-auth — the admin-token vs. JWT auth model that backs the cache layer.
- /documentation/data/revenue-data-flow — the four ingestion paths into our analytics warehouse that feed the snapshot endpoint.
- Engineering lessons (internal):
docs/KEY-LESSONS-PLATFORM.mdin the repo — GL#402 covers the LCP taxonomy; GL#412 covers the 4-layer cost stack documented here. (Repo-only path; not a public route.)