Quran.com release plan

SSR / i18n Production Go-Live Plan

Moving server-side rendered, country + language localised pages from the isolated SSR host to production quran.com with Cloudflare QDC edge caching.

PR #31 — SSR-POC-1 → production

Visual snapshot audited 2026-06-20. Metrics refreshed 2026-06-20 22:23 +07. Use this report alongside the Notion plan; Notion remains the source of truth.

Cloudflare 30d requests
~2.87B
~95.7M/day, ~1.1K/sec avg
Cloudflare 30d views
~119M
2026-05-21 to 2026-06-20
GA 30d active users
~5.1M
Google Analytics home card
Active right now
~9.8K
GA active users in last 30 min
Why scale matters here: Cloudflare saw about 2.87B requests and 119M page views over the last 30 days. SSR cache misses are heavier than the current production path because every miss hits origin with a full server render. Launch monitoring must focus on cache hit ratio, origin RPS, CPU/memory, preference API latency, and cache key cardinality.
In this report
SSR/i18n architecture overview Request flows from browser through Cloudflare edge (with QDC snippet caching) to Kamal proxy and Next.js SSR origin. Preference and admin APIs feed country/language defaults. Browser GET /vi/1 req Cloudflare Edge QDC Snippet (snippter_using_prefs) Cache key: URL + locale + prefs + auth state HIT → serve cached HTML MISS → forward to origin MISS Kamal Proxy LB + routing Next.js SSR prod-1 / prod-2 / prod-3 Reads QDC_PREFS cookie Calls Preference API Renders correct locale Returns HTML + headers Tools / Admin API country_language_preference default country/language config SSR first-load defaults quran-api-internal .quran.foundation App deploy + Snippet cutover are separate operational steps
1
Open blockers before release
The 2 items below must be closed before scheduling the go-live cutover
Hard rule: Do not schedule or start the big-bang release until every item below is closed. These are not best-effort — they are release gates.
Latest verification snapshot — 2026-06-20
This table lists only the currently open blockers and decisions. Passing sampled checks are intentionally omitted.
Item checkedObserved evidenceVerdict
Raw cache-key header/vi/1 and logged-in private routes still return X-QDC-Edge-Cache-Key. The sampled private keys did not expose the raw auth cookie, but production should not expose internal private cache-key material.blocker open
Private user key sourceLogged-in private routes used __qdc_u and did not leak the raw auth cookie. However, the source cookie used for the key is readable in document.cookie, so it is not yet proven safe as the only private-cache authorization boundary.blocker open
Security Remove raw cache-key response header
X-QDC-Edge-Cache-Key is still returned on sampled public and logged-in private SSR HTML. Even though the sampled private key did not expose the raw auth cookie, production should not expose internal private cache-key material.
Close: remove out.headers.set('X-QDC-Edge-Cache-Key', cacheKey); from the Cloudflare snippet before production cutover. Keep X-QDC-Edge-Cache only.
Security Prove or replace the private user-key source
Logged-in routes are user-keyed with __qdc_u, but the source cookie used for that key is readable in document.cookie. That is not yet proven safe as the only authorization boundary for private cached HTML.
Close: use a non-forgeable, auth-bound, server-controlled signal for __qdc_u, or bypass edge caching for private/protected SSR pages until that signal exists.
Release team roles
OPS
DevOps team
Owns Cloudflare Snippet/Page Rule changes, deployer/Kamal execution, Discord deploy confirmation, post-deploy cache clear, Grafana/infra monitoring, and infrastructure rollback actions.
ENG
Engineering team
Owns go/no-go decision support, app/Sentry/New Relic monitoring, SSR/cache correctness triage, smoke-test interpretation, and app rollback decision with DevOps.
QA
Manual QA resource
Runs the user-facing smoke pass across public, private, auth, locale, and country/language scenarios.
CFG
Admin-config owner
Controls country_language_preferences during freeze. Approves config changes with targeted API/proxy/config cache purge.
2
Release timeline and status
What is verified now, what is still blocking, what happens during cutover, and what follows after launch
How to read this section: this is an end-to-end release readiness timeline. It is not a list of failed checks. Each step is marked as verified now, open blocker, release-day action, or post-release follow-up.
2.1
verified now App build and SSR preview readiness
The release branch passed yarn test && yarn test && yarn build && yarn test:integration:critical && yarn test:integration:locale-matrix. Treat yarn lint as closed only when CI or a separate lint run is green. Behavior verified on ssr.quran.com is accepted as pre-release evidence because cutover changes host/scope, not the app code path.
2.2
verified now Preference cache key policy
Preference cache key policy is not a failed check. Live smoke against ssr.quran.com passed: guest fallback buckets, valid QDC_PREFS_KEY buckets, homepage auth-state split, token bypass, content API bypass, and preference API contract all passed. Keep this as release-day smoke because Cloudflare dashboard rule parity must still be confirmed when the host switches to quran.com.
__qdc_v=4 __qdc_l locale __qdc_p prefs __qdc_d browser language __qdc_c country __qdc_a auth state __qdc_t type __qdc_u private only
2.3
open blocker Private cache safety and route decisions
Close the Section 1 blockers before scheduling release: remove X-QDC-Edge-Cache-Key and prove or replace the private __qdc_u source. Treat /collections/[collectionId] as private/owner-specific for this release; it must be user-keyed or bypassed. A logged-in request must never store user-specific HTML or private JSON that can later be served to another user or a guest.
2.4
verified now _next/data cache behavior is route-dependent, not a blocker
Verified on ssr.quran.com: static page data such as /_next/data/.../en/profile.json is cacheable and contains public page props/localization strings, not real account data. Mixed guest/logged-in SSR routes such as /my-quran return /_next/data/.../en/my-quran.json with CF-Cache-Status: BYPASS and Cache-Control: private, no-cache, no-store. Keep this as a smoke check only: sampled _next/data responses must not contain real user identifiers, emails, notes, bookmarks, collections, or account state.
2.5
before cutover Admin config and API/proxy cache freshness
Update production Tools/Admin country_language_preferences, verify origin truth at https://quran-api-internal.quran.foundation/api/qdc/resources/country_language_preference, purge only affected config API/proxy entries, then prewarm exact production URLs without random cache-busters.
1
Update production Tools/Admin config first
This config drives SSR country/browser-language defaults.
2
Verify origin truth directly
https://quran-api-internal.quran.foundation/api/qdc/resources/country_language_preference
3
Purge targeted config cache entries
Use only country_language_preference targets on quran.com/api/proxy/content/api/qdc/... and apis.quran.foundation/content/api/qdc/.... Do not purge unrelated API resources unless there is an incident.
4
Prewarm exact URLs and compare responses
Internal origin, public API edge, and frontend proxy bodies should match; Age should be low or absent after purge.
2.6
release-day action Cloudflare cutover and production smoke
After the app deploy Discord success and post-deploy cache-clear message, execute the Cloudflare dashboard steps in Section 3: preserve non-production cache behavior, disable the risky Page Rules, change the snippet host/rule from ssr.quran.com to quran.com, and run smoke tests against production. The raw cache-key header must already be removed before release scheduling.
2.7
verified now Infra baseline and launch monitoring plan
Current 3 FE hosts have enough observed headroom for launch. Keep the existing fleet for release, monitor the first 60 minutes, and roll back immediately on cache correctness, private data, OOM/restart loop, or sustained infra/app threshold failures.
2.8
post-release follow-up Automation and non-production alignment
After production is stable, turn the smoke suite into one fish-compatible release command and align testing/prelive/staging/branch-preview hosts with production-grade SSR/i18n cache behavior. These are not release-day blockers unless a current rule change would break production or non-production safety.
Infra baseline details

Baseline refreshed 2026-06-20 22:23 +07 using a 30-day Grafana query window for FE/Kamal metrics and Cloudflare HTTP Traffic 30-day analytics. Decision: keep existing 3 FE servers for release. The refreshed window still shows substantial headroom with no restart/OOM pattern.

qf-fe-prod-1 144.76.7.146
CPU avg/p95/max14.7 / 22.7 / 86.9%
RAM avg/p95/max55.2 / 61.2 / 66.0%
1m load (of 32)5.18 / 8.11 / 21.16
Disk used37.8% current / 53.9% max
qf-fe-prod-2 144.76.7.147
CPU avg/p95/max14.7 / 23.1 / 80.8%
RAM avg/p95/max54.6 / 60.1 / 64.2%
1m load (of 32)5.26 / 8.27 / 16.37
Disk used36.0% current / 53.7% max
qf-fe-prod-3 5.9.73.231
CPU avg/p95/max12.3 / 19.8 / 91.5%
RAM avg/p95/max31.0 / 35.6 / 43.3%
1m load (of 32)4.37 / 7.16 / 35.39
Disk used17.4% current / 26.7% max
Kamal Proxy baseline (30d sampled)
Request rate avg: 550.7 rps Request rate p95: 739.1 rps p95 latency avg: 444.6 ms p95 latency p95: 542.2 ms 5xx avg: 0.088% 5xx p95: 0.231% Cloudflare HIT share (30d): 53.2% Cloudflare cached bytes (30d): 84.1% Container restarts: 0, OOM kills: 0
Launch traffic thresholds
MetricNormal / Cold start⚠ Alert (5 min)🛑 Rollback (10 min)
Kamal Proxy RPS500–800 rps> 800 rps> 1,100 rps + poor HIT ratio
Kamal p95 latency< 850 ms850–1,000 ms> 1,000 ms + rising errors
Kamal 5xx rate< 0.5%0.5–1%> 1% or SSR failures
FE CPUNormal headroomAny host >70%2+ hosts >70% for 10 min + worsening Kamal
OOM / restart loop0Any restartImmediate rollback on OOM or restart loop
Cache correctnessUser/private data isolatedImmediate rollback on ANY user data leak or wrong locale from cache
3
Release execution
Ordered steps for app deploy and Cloudflare cutover
3.1
High-level sequence
1
Merge PR #31 into production
This automatically starts the trusted deployer/Kamal production deploy for quran.com. Do not run a separate manual deploy.
2
Wait for Discord deploy success notification
Await Discord deploy success for qdc/production. Do not proceed until this arrives.
3
Confirm post-deploy cache-clear message
The Kamal post-deploy hook automatically purges the Cloudflare host cache for quran.com. Confirm this message. Manual purge only needed if this message failed or smoke tests show stale cache.
4
Execute Cloudflare cutover
Follow the step-by-step Cloudflare checklist below. Keep interval between Discord success and Cloudflare cutover short and monitored.
5
Run release smoke tests against quran.com
Use manual and automated smoke test suites against production. See Section 4.
6
Monitor first 60 minutes
DevOps, Engineering, QA, and config owners actively watching. Rollback authority assigned. See Section 4.4 for monitoring targets.
3.2
Cutover execution checklist
These steps are release actions during cutover — not pre-conditions. Complete them in order. Dashboard links: Cache Rules · Page Rules list for #4/#5 · QDC snippet editor · Custom purge.
Confirm country_language_preferences freeze is already active
This is a pre-release/release-window control from T-1h to T+1h. Do not start it mid-cutover.
Create or confirm the non-production replacement Cache Rule before disabling Page Rules
Open Cache Rules. Confirm a replacement rule exists for subdomains only with expression: (http.host ne "quran.com" and ends_with(http.host, ".quran.com")). This preserves non-production behavior while the broad Page Rules are removed from production risk.
Disable Page Rule #4: *quran.com/* → Cache Everything
In the Page Rules list, disable row #4. Pattern: *quran.com/*. Action: Cache Everything + Automatic HTTPS Rewrites: On. Cloudflare rule ID: 71b6988743ce4cd4ef3fd56a8370f7bd. Only do this after the non-production replacement Cache Rule is in place. Do not replace it with *.quran.com/* because that wildcard can still match apex behavior unexpectedly.
Disable Page Rule #5: *quran.com/_next/data/*.json → Cache Everything
In the Page Rules list, disable row #5. Pattern: *quran.com/_next/data/*.json. Action: Cache Everything. Cloudflare rule ID: 32e5dcc9a5295eaca8cc4a1904913f18. This does not mean all _next/data is uncached; QDC/origin headers should own route-dependent behavior.
Open QDC snippet editor and switch host to production
Open snippter_using_prefs. In the snippet code, change HOSTNAME: 'ssr.quran.com' to HOSTNAME: 'quran.com'.
Update Snippet Rule to production host only
In snippter_using_prefs, open the Snippet Rule settings and change Hostname equals ssr.quran.com to Hostname equals quran.com. Confirm expression preview: (http.host eq "quran.com").
Save snippet changes and confirm Snippet limits
Save the snippet/rule changes. Package is expected around 26 KB; Cloudflare Snippets limit is 32 KB.
Confirm Discord post-deploy cache-clear message; manually purge only if that failed
If the post-deploy purge did not run or smoke tests show stale cache, use Custom purge for the affected production host/cache paths.
Confirm no Tools/Admin config changes occurred during freeze
Capture before/after Cloudflare dashboard screenshots in release record
4
Health verification and smoke tests
Do not declare go-live successful until all gates pass
4.1
Verification gates

Must be TRUE before declaring success

Hetzner LB and all 3 FE targets healthy
Discord deploy success + post-deploy cache-clear received
Grafana infra baseline captured, Sentry/New Relic checked
earlyoom/system-log visibility checked on all FE hosts, or explicitly accepted unavailable
Public pages show QDC MISS → HIT
Private/protected pages user-keyed or bypassed and isolated (User A/B/guest body + API + UI test)
Auth pages and tokenized URLs bypass QDC (never return HIT)
_next/data behavior matches route-dependent policy

Roll back immediately if ANY of these

Wrong language/country/config served from cache
User/private data cache risk (body leak)
Auth/session Set-Cookie cache risk
OOM, earlyoom, or restart loop
Kamal 5xx or SSR exceptions cross rollback threshold
CF HTML HIT ratio stays poor while origin RPS climbs
Cache-key cardinality explodes with origin RPS climbing
Preference API failures cause degraded defaults under normal cache keys
4.2
Manual smoke tests
Assumption: automated smoke scripts and release blockers are already fixed. This section is a human sanity pass to confirm production behaves correctly, not a command sheet.
1
Open representative public localized pages
Check /vi/1, /en/1, and one Arabic page. Verify the page loads, the locale is correct, the content matches the URL, and repeat visits are served by QDC cache as expected.
2
Open auth and security-sensitive pages
Check login, reset-password, logout, and tokenized URLs. Verify they are not cached as reusable HTML and auth/session cookies are not cached or stripped incorrectly.
3
Check rendered body, not only headers
Verify html lang, direction, visible locale, serialized app data, and selected defaults all agree. Public pages must not show email, username, notes, bookmarks, collections, reading-goal state, or account data.
4
Confirm cache ownership
For localized HTML, QDC should own the HTML cache behavior. Fail the release if a localized SSR HTML response is cached by Cloudflare without QDC ownership, if raw cache-key material is exposed, or if the wrong locale/user/config appears from cache.
5
Confirm _next/data matches route-dependent policy
Static/public page data can be cached. Private or mixed guest/logged-in SSR data, such as /my-quran, must bypass with private/no-store headers. Do not accept stale build data, user data in static JSON, URL-only cached private data, or behavior that freezes the app after deploy.
Minimum route smoke set
/ /en/1 /vi/1 /ar/1 /2/255 /page/1 /search?q=mercy /reading-goal /reading-goal/progress /collections/all /collections/<private-id> /profile /my-quran /take-notes /login /reset-password

public    private/user-keyed    product decision    auth/bypass

User A / User B / guest isolation test

Use the two approved test accounts for this smoke: User A is osama@quran.com; User B is osama+1@quran.com. Do not record passwords in this report or release notes.

1
Start after Discord post-deploy cache-clear message for quran.com
2
User A visits a private/protected route
3
User A repeats — confirm they receive the intended user-keyed HIT
4
User B visits same route — confirm they do NOT receive User A body, __NEXT_DATA__, private auth API responses, SWR/client state, browser disk-cache response, or visible UI fields
5
Guest visits same route — confirm no User A body, no cached redirect, no private shell
6
Repeat for a public reader route logged in and logged out. If SSR body differs for logged-in users, the route must be user-keyed.
Real-user country/language smoke matrix
CountryBrowser languageLogin state
PakistanUrduLogged out, new visitor
PakistanEnglishLogged out, new visitor
IndonesiaIndonesianLogged out, new visitor
TurkeyTurkishLogged out, new visitor
Egypt or Saudi ArabiaArabicLogged out, new visitor
United StatesEnglishLogged out, new visitor
India or BangladeshLocal or EnglishLogged out, new visitor
Any tested countryAnyLogged in with saved preferences
4.3
Production smoke commands
Run after Cloudflare cutover
The SSR preview checks are already accepted. Run these against production after the hostname cutover:
env BASE_URL=https://quran.com LOCALE=en ./scripts/qf-318/onboard-verify.sh
env BASE_URL=https://quran.com LOCALE=en ./scripts/qf-318/edge-cache-smoke.sh
Automation note: use the scripts above for launch validation. Detailed engineering follow-up is tracked in Notion so this report stays readable for the full launch team.
4.4
60-minute launch monitoring checklist
Watch these signals
HTML cache hit ratio climbs after warmup
QDC HIT/MISS/BYPASS by policy
Invalid/random QDC_PREFS_KEY rate
Origin RPS materially below public pageview rate
FE CPU does not sustain above threshold
No restart loop, no OOM, no earlyoom
No Sentry spike from SSR/hydration/config mismatches
No wrong language/country defaults in smoke checks
No auth/private route cache leakage
Tools/Admin preference API latency and error rate healthy
Rollback sources of truth
Use: Kamal Proxy RPS + 5xx, FE host CPU/RAM/load, Sentry/New Relic app exceptions, smoke failures, user-facing symptoms

5
Rollback plan
DevOps and Engineering owners must be present and reachable
Key principle: Rollback must be executable, not theoretical. For cache correctness incidents, start with Cloudflare — do not wait for an app redeploy.
5.1
Rollback sequence
1
Disable the Cloudflare snippet rule for quran.com
Or revert HOSTNAME back to ssr.quran.com.
2
Purge quran.com broadly enough to clear QDC/custom-key variants
If stale QDC HITs remain, purge everything for the zone. Use CACHE_VERSION only for QDC snippet key changes before re-enabling snippet caching.
3
Confirm QDC headers disappear from quran.com HTML responses
4
Confirm origin load stabilizes
5
Roll back the production app only if app errors continue
Use the deployer/Kamal rollback path after Cloudflare has been disabled.
6
Confirm all three FE hosts serve the rollback build metadata
7
Purge quran.com broadly again
8
Re-run smoke tests
For Cloudflare-only cache correctness issues: Change the snippet HOSTNAME and Snippet Rule host back to ssr.quran.com, save, and rerun smoke tests. If stale QDC responses remain, purge quran.com manually.
6
Post-release work
After production is stable — not release-day blockers
6.1
Non-production environment alignment
During production cutover, make only the required non-production replacement Cache Rule before disabling the broad Page Rules. Broader non-production alignment remains a separate follow-up after launch monitoring completes.
1
Wait for production launch monitoring to complete with no active rollback
2
Inventory active non-production and branch-preview host patterns under *.quran.com
From deployer, DNS, and Cloudflare.
3
Classify each host pattern
SSR/i18n enabled (QDC snippet cache key) vs. legacy/non-SSR (existing cache behavior temporarily).
4
For every non-production host running SSR/i18n, remove URL-only Cache Everything from localized HTML
Use Page Rules and Cache Rules. QDC snippet should own SSR HTML caching there too.
5
Keep legacy/non-SSR preview hosts unchanged until migrated
6
Smoke one active host from each non-production class before and after non-production rule changes
This is a post-release non-production alignment check, not part of the production cutover sequence.
7
Once all non-production hosts migrated, remove legacy subdomain-only URL-cache workaround
6.2
Post-release hardening
Improvements, not release-day blockers.
Cache improvements
Check Tiered Cache for normal HTML cache path
Add X-QDC-Cache-Eligible origin header so edge doesn't infer route policy alone
Add cache tags for targeted purge if config/content purges become too broad
Observability
Add permanent QDC dashboards: cache status, route family, key cardinality, private/user MISS rate, preference-key validity
Add synthetic country/locale probes
Add automated route-policy tests for public, user-aware, private, auth/bypass routes
Capacity
Consider additional permanent FE capacity based on measured production SSR origin RPS after launch
Move complex edge logic from Snippet to Worker if Snippet CPU/memory/package limits become tight (current: ~26 KB of 32 KB)
CI/automation
Add permanent automated route-policy tests
Make smoke scripts production gates once body checks and isolation tests are added

SSR/i18n Production Go-Live — Quran.com · Metrics refreshed 2026-06-20 22:23 +07 · PR #31