The Trinity BeastElastiCache Operations & Recovery

Operational reference for the Valkey cache layer — key inventory, dependency classification, graceful degradation behavior, disaster recovery procedures, and the Valkey State Reconciler (sync job). This document replaces the former ElastiCache Key Definitions reference.

Engine: Valkey 7.2 Node: cache.r7g.2xlarge Updated: June 3, 2026 Version: v1.0

Table of Contents

  1. Design Philosophy
  2. Infrastructure Specifications
  3. Dependency Classification Map
  4. Complete Key Inventory
  5. Graceful Degradation Matrix
  6. BeastReconciler — Valkey State Reconciler
  7. Disaster Recovery Runbook
  8. Health Monitoring & Alerting
  9. Validation Results — First Production Run
  10. Retired Key Families

List of Diagrams

  1. Diagram 1.1: Valkey's Role in TBI Architecture
  2. Diagram 3.1: Dependency Classification
  3. Diagram 6.1: BeastReconciler — Valkey State Reconciler Flow
  4. Diagram 7.1: Cold-Start Recovery Sequence

1. Design Philosophy

Core Principle: Valkey is an operational backbone, not just a cache. It hosts the LRS reporting layer, search indexes, translation state, security intelligence, and cluster coordination. Losing Valkey degrades half the platform's features — but never takes the price API offline.

Three Rules of Cache Operations

  1. The system MUST function without Valkey. The price API — the revenue-generating core — operates entirely from local sync.Map and live exchange WebSocket feeds. Valkey adds speed, never correctness.
  2. Everything in Valkey is rebuildable. Every key family has an authoritative source outside Valkey (Aurora, S3, live traffic, or computed on-demand). Running the sync job restores 100% of operational state.
  3. Valkey is eager to return, not required to be present. Application code checks for nil client, uses timeouts on every call, and falls through gracefully when unavailable. When Valkey comes back, the system resumes using it immediately — no restart required.

Diagram 1.1: Valkey's Role in TBI Architecture

graph TB
    subgraph "Revenue Path (Zero Valkey Dependency)"
        WS["WebSocket Feeds\n6 Exchanges"] --> SM["sync.Map\nLocal Cache"]
        SM --> API["/price API"]
        API --> Customer
    end

    subgraph "Operational Layer (Valkey-Powered)"
        SM -.->|"flush every 30s"| VK[("Valkey\n52 GB")]
        VK --> LRS[LRS Reports]
        VK --> Search[Full-Text Search]
        VK --> TX[Translation State]
        VK --> HP[Honeypot Queue]
        VK --> AG[Adaptive Governor]
        VK --> CS[Cluster Stats]
    end

    subgraph "Authoritative Sources"
        Aurora[("Aurora PostgreSQL")] -.->|"nightly sync"| VK
        S3[("S3 Bucket")] -.->|"search rebuild"| VK
        Traffic[Live Traffic] -.->|accumulates| VK
    end

    style SM fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style VK fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style Aurora fill:#451a03,stroke:#f59e0b,color:#e2e8f0
    style API fill:#064e3b,stroke:#10b981,color:#e2e8f0
        

2. Infrastructure Specifications

Node Type
cache.r7g.2xlarge
vCPU
8
Memory
52 GB
Engine
Valkey 7.2
TLS
Enabled
Connections
1,500
PropertyValue
Endpointmaster.trinity-beast-cache.ptsbmm.use2.cache.amazonaws.com:6379
VPCData VPC (vpc-0876ee7be3a677f26, 172.31.0.0/16)
Client Librarygo-redis/v9 UniversalClient
Pool Size300 per container (5 containers × 300 = 1,500 total)
Read Timeout3 seconds
Write Timeout3 seconds
Max Retries3
PersistenceNone (non-persistent ElastiCache — all data is rebuildable)
ReplicationSingle node (no replicas — cost optimization)
EncryptionIn-transit (TLS 1.2+), at-rest (AWS-managed)

Why not MemoryDB? We previously ran MemoryDB ($348/mo reserved). Migrated to standard ElastiCache because persistence is unnecessary when every key can be rebuilt from Aurora/S3 sources. The recovery procedures in this document make persistence redundant — the sync job IS the durability layer.

3. Dependency Classification Map

Every Valkey key family is classified by its role in the system and what happens when it disappears.

Diagram 3.1: Dependency Classification

graph LR
    subgraph "Class A: Performance Acceleration"
        A1["price:*"]
        A2["apikey:*"]
        A3["app:config"]
        A4["report:config"]
        A5["report_count:*"]
        A6["errmsg:*"]
        A7["public:site-assets"]
        A8["tx:params"]
    end

    subgraph "Class B: Operational State"
        B1["usage_logs:* indexes"]
        B2["usage_log:* hashes"]
        B3["search:index:*"]
        B4["report_usage_logs:*"]
        B5["docs:registry:*"]
        B6["report:text:*"]
    end

    subgraph "Class C: Coordination"
        C1["cluster:stats:*"]
        C2["adaptive:*"]
        C3["newsletter:lock:*"]
        C4["digest:lock:*"]
        C5["receipt:session:*"]
    end

    subgraph "Class D: Intelligence"
        D1["honeypot:*"]
        D2["autoops:threats:daily"]
        D3["autoops:support:*"]
        D4["autoops:bedrock:spend:daily"]
    end

    style A1 fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style A2 fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style B1 fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style B3 fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style C1 fill:#2e1065,stroke:#a78bfa,color:#e2e8f0
    style D1 fill:#451a03,stroke:#f59e0b,color:#e2e8f0
        
ClassRoleOn LossRecovery
APerformance AccelerationSlower (falls through to Aurora/S3/live source)Automatic (next read populates via write-through)
BOperational StateFeatures unavailable (LRS reports, search, doc registry)Sync Job (full rebuild in one run)
CCoordinationLocal-only operation (per-container, not cluster-wide)Self-healing (TTL expiry or next write)
DIntelligenceAccumulated data lost (honeypot history, support gaps)Accumulates from live traffic — no rebuild needed

4. Complete Key Inventory

4.1 Class A — Performance Acceleration (Aurora/S3 Backed)

Key PatternTypeTTLWriterReaderAuthoritative Source
price:{ASSET}STRING (JSON)93 daysPriceEngine flush (30s cycle), Kraken prewarmPriceHandler, BatchHandlersync.Map (WebSocket feeds)
apikey:{api_key}HASHNone (refreshed by sync)Sync Job, LPO write-throughAPI key middlewareAurora api_keys table
app:configHASHNoneSync Job, LPO write-throughParamLoader (5-min poll)Aurora application_parameters
report:configHASHNoneSync JobLRS LoadReportParametersAurora report_parameters
report_count:{uuid}HASH24 hoursLRS counter (write-through)LRS CheckLimitsAurora report_count
errmsg:{lang}:{key}STRINGNoneSync JobGetErrorMessageAurora error_messages
public:site-assetsSTRING (JSON)1 hourSite assets handlerSameS3 bucket listing
tx:paramsHASH27 hoursSync JobTranslation serviceAurora translation_parameters
translation:cost_per_chunk*STRING48 hoursSync JobTranslation quote endpointComputed from Aurora actuals

4.2 Class B — Operational State (Sync Job Rebuilt)

Key PatternTypeTTLWriterReaderCount
usage_logs:indexSORTED SETNone (pruned to 93 days)Sync JobLRS handlers~39,000 members
usage_logs:api_key:{id}SORTED SETNoneSync JobLRS handlersPer API key
usage_logs:asset:{ASSET}SORTED SETNoneSync JobLRS handlersPer asset
usage_log:{uuid}HASHNoneSync JobLRS PipelineHGetAll~39,000 keys
report_usage_logs:indexSORTED SETNoneSync Job + LRS loggerLRS detail/summary handlersVariable
report_usage_logs:api_key:{id}SORTED SETNoneSync Job + LRS loggerLRS handlersPer API key
report_usage_log:{uuid}HASHNoneSync Job + LRS loggerLRS PipelineHGetAllVariable
search:index:{lang}STRING (JSON)NoneBuildSearchIndex handlerSearch handler12 keys (~500 KB each)
docs:registry:{file}STRING (JSON)NoneDoc registry handlerDoc registry endpoints~37 keys
docs:registry:indexSETNoneDoc registry handlerDoc registry list1 key
docs:pending:translationSORTED SETNoneDoc publish handlertranslate-pending command1 key
docs:session:logLISTNonesession-close handlerAdmin endpoint1 key
report:text:{YYYY-MM-DD}STRING30 daysSync Job (from S3)Digest Lambda (newsletter)~30 keys

4.3 Class C — Coordination (Self-Healing)

Key PatternTypeTTLWriterReader
cluster:stats:{NodeName}STRING (JSON)30 secondsMetrics publisher (every 3s)ClusterStatsHandler
{adaptive:{name}}:successesSTRING (counter)60 secondsAdaptive governor syncLoopGovernor readThrottleState
{adaptive:{name}}:totalSTRING (counter)60 secondsAdaptive governor syncLoopGovernor readThrottleState
{adaptive:{name}}:throttleSTRING30 secondsAdaptive governorAll containers (coordinated throttle)
newsletter:lock:{year}-W{week}STRING30 minutesDigest Lambda (SET NX)Same (dedup check)
digest:lock:{type}:{date}STRING30 minutesDigest Lambda (SET NX)Same (dedup check)
receipt:session:{sessionID}STRING1 hourReceipt LambdaSame (dedup check)
usage:counter:{apikey}HASH48 hoursPrice handlerLRS real-time stats
sync:last_timestampSTRINGNoneSync JobSync Job (high-water mark)

4.4 Class D — Intelligence (Accumulates from Live Traffic)

Key PatternTypeTTLWriterReader
honeypot:ip:{ip}HASHNoneHoneypot handlerStats, Bedrock analyzer
honeypot:logSORTED SETTrimmed to 7 daysHoneypot handlerBedrock analyzer, stats
honeypot:autoblock_queueLISTNone (consumed by processor)Honeypot handler (LPUSH)Honeypot processor Lambda (RPOP)
honeypot:blocked_ipsSETNoneHoneypot handlerStats endpoint
autoops:threats:dailySTRING (JSON)Overwritten every 5 minBedrock analyze LambdaKCC threat-status
autoops:support:knowledge:b64STRINGNoneAdmin push / Sync JobRhema Lambda
autoops:support:gapsSORTED SET30 days (EXPIRE)Rhema Lambda, Rhema APIDigest Lambda (weekly)
autoops:support:weeklyLIST8 days (EXPIRE)Rhema LambdaDigest Lambda
autoops:bedrock:spend:dailySTRING (counter)24 hoursTranslation engine (INCRBY)Translation submit (cost cap)
kcc:dailySTRING (JSON)24 hoursKCC daily-collectKCC daily-render

4.5 Translation Engine State (Class A — Aurora Backed)

Key PatternTypeTTLWriterReader
tx:job:{id}STRING (JSON)24 hoursTranslation handlersStatus poll endpoint
tx:activeSETNoneTranslation submit/cancelQueue check, status
tx:historyLISTNone (pruned)Translation finalizeHistory endpoint
tx:idempotency:{key}STRING24 hoursTranslation submitSame (dedup check)

5. Graceful Degradation Matrix

What happens to each feature when Valkey is unavailable:

FeatureWithout ValkeyUser ImpactRisk
Price API (/price, /prices)Bypasses L2 cache, serves from sync.Map (L1) or live exchange (L3)None — same data, +50ms latency on cache missNone
API Key ValidationFalls through to Aurora query+5ms per request until local cache warmsNone
Application ParametersFalls through to AuroraNone — same dataNone
LRS Usage ReportsReturns 503 with clear messageReports unavailable — restored on next syncCritical
LRS Report Usage DetailReturns 503Report history unavailableCritical
Full-Text SearchReturns stale in-memory cache (5-min window), then emptySearch broken until rebuildHigh
Error Messages (i18n)Falls through to English → hardcoded fallbackNon-English users see English errorsLow
Cluster StatsNodes missing from responseAdmin dashboard shows partial dataLow
Adaptive GovernorFalls back to per-container local countersNo cluster-wide coordination — each container throttles independentlyMedium
Honeypot SystemStops accumulating hit data; auto-block queue stallsScanners not blocked until Valkey returns (WAF existing blocks persist)Medium
Translation Real-Time ProgressStatus polls return 404; fall back to Aurora for completed jobsNo live progress bar — check back when doneLow
Document RegistryAdmin doc management endpoints return emptyDoc workflow broken, no data loss (S3 is source)Medium
Newsletter Dedup LocksLock check fails → proceeds without dedupPossible duplicate email send (unlikely)Low
Rhema Knowledge BaseRhema operates without context (hardcoded fallback)Support responses less accurateMedium
Bedrock Spend CapCounter resets to 0 — cap loses memoryTranslation could temporarily exceed $600/day soft capMedium
Webhook DeliveryNo Valkey dependency — reads from local sync.MapNoneNone

Key Insight: The price API, webhook delivery, API key validation, and application parameters — the four pillars of the revenue path — are ALL resilient to Valkey loss. The platform continues to serve customers. Only internal tooling and reporting degrade.

6. BeastReconciler — Valkey State Reconciler

BeastReconciler (trinity-beast-sync-job, 1 AM EST nightly) is the single recovery mechanism for Valkey. It is a first-class member of the ECS cluster alongside BeastMain, BeastMirror, BeastLRS, BeastWebhook, and BeastTranslate. After a cold start (new node, failover, or recovery), running BeastReconciler once restores 100% of operational state.

Diagram 6.1: BeastReconciler — Valkey State Reconciler Flow

flowchart TD
    Start([BeastReconciler Start]) --> CheckFirst{"First Run?
usage_logs:index exists?"} CheckFirst -->|"Yes - Cold Start"| Full["Full Historical Load
93 days from Aurora"] CheckFirst -->|"No - Incremental"| Inc["Incremental Load
since last_timestamp"] Full --> Prune["Prune Old Data
remove entries > 93 days"] Inc --> Prune Prune --> RUL["Sync Report Usage Logs
Aurora to Valkey"] RUL --> Keys["Sync API Keys
Aurora to Valkey hashes"] Keys --> Params["Sync App Params
Aurora to app:config hash"] Params --> ErrMsg["Sync Error Messages
Aurora to errmsg:* strings"] ErrMsg --> TxCost["Sync Translation Costs
Calculate averages"] TxCost --> TxParams["Sync Translation Params
Aurora to tx:params hash"] TxParams --> Rhema["Sync Rhema Knowledge
S3 to autoops:support:knowledge:b64"] Rhema --> DocReg["Sync Doc Registry
S3 listing to docs:registry:*"] DocReg --> Reports["Generate Report Text
S3 HTML to report:text:*"] Reports --> Search["Rebuild Search Index
CloudFront to search:index:*"] Search --> Health["Verify Valkey Health
PING + DBSIZE + baseline check"] Health --> End([Complete]) style Full fill:#7f1d1d,stroke:#fca5a5,color:#e2e8f0 style Inc fill:#064e3b,stroke:#10b981,color:#e2e8f0 style Rhema fill:#2e1065,stroke:#a78bfa,color:#e2e8f0 style DocReg fill:#2e1065,stroke:#a78bfa,color:#e2e8f0 style Health fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0

6.1 BeastReconciler Responsibilities — Current (12)

#FunctionSourceTarget Keys
1syncHistorical / syncIncrementalAurora usage_logsusage_logs:index, usage_logs:api_key:*, usage_logs:asset:*, usage_log:*
2pruneOldDataValkey (time-based removal)All usage_logs:* keys
3syncReportUsageLogsAurora report_usage_logsreport_usage_logs:*, report_usage_log:*
4syncAPIKeysAurora api_keysapikey:* hashes
5syncAppParamsAurora application_parametersapp:config hash
6syncErrorMessagesAurora error_messageserrmsg:{lang}:{key}
7syncTranslationCostPerChunkAurora (computed averages)translation:cost_per_chunk*
8syncTranslationParamsAurora translation_parameterstx:params
9resetMonthlyCounts (1st of month)Aurora (zero counters)Flush report_count:*
10generateDailyReportTextS3 daily-reports/report:text:{YYYY-MM-DD}
11rebuildSearchIndexCloudFront docs (all languages)search:index:{lang}
12pruneReportManifestS3 lifecycleS3 objects (not Valkey)

6.2 BeastReconciler Responsibilities — New (Recovery Expansion)

#FunctionSourceTarget KeysPurpose
13syncRhemaKnowledgeS3 rhema/knowledge-base.txtautoops:support:knowledge:b64Ensures Rhema always has her context, even after cold start
14syncDocRegistryS3 docs/ listing + file metadatadocs:registry:*, docs:registry:indexRebuilds the document lifecycle registry from the S3 source of truth
15verifyValkeyHealthValkey (PING, DBSIZE, INFO)Logs onlyPost-sync validation — confirms key count meets baseline, flags anomalies

Recovery Guarantee: After a full BeastReconciler run (FORCE_FULL_SYNC=true), every Class A and Class B key is populated. Class C keys self-heal within 60 seconds of containers resuming operation. Class D keys accumulate from live traffic — no rebuild needed or possible.

6.3 Manual Recovery Commands

For situations where BeastReconciler hasn't run yet, or you need immediate recovery of specific subsystems:

# Trigger BeastReconciler immediately (doesn't wait for 1 AM)
aws ecs run-task --cluster trinity-beast-fargate-cluster \
  --task-definition trinity-beast-sync-job \
  --overrides '{"containerOverrides":[{"name":"sync-container","environment":[{"name":"FORCE_FULL_SYNC","value":"true"}]}]}' \
  --network-configuration '...' --region us-east-2

# Rebuild search index only (fast — ~30 seconds)
bash scripts/kcc.sh build-search

# Rebuild doc registry from S3 (future command)
bash scripts/kcc.sh rebuild-doc-registry

# Verify Valkey health and key count
bash scripts/kcc.sh valkey-health

7. Disaster Recovery Runbook

Diagram 7.1: Cold-Start Recovery Sequence

sequenceDiagram
    participant Op as Operator
    participant ECS as ECS Containers
    participant BR as BeastReconciler
    participant VK as Valkey - New Node
    participant Aurora as Aurora
    participant S3 as S3

    Note over VK: Valkey node replaced - cold start

    Op->>BR: Trigger FORCE_FULL_SYNC=true
    BR->>Aurora: SELECT * FROM usage_logs (93 days)
    BR->>VK: Batch HSET + ZADD (39K+ entries)
    BR->>Aurora: SELECT * FROM api_keys
    BR->>VK: HSET apikey:* (all active keys)
    BR->>Aurora: SELECT * FROM application_parameters
    BR->>VK: HSET app:config
    BR->>Aurora: SELECT * FROM error_messages
    BR->>VK: SET errmsg:*
    BR->>S3: Read rhema/knowledge-base.txt
    BR->>VK: SET autoops:support:knowledge:b64
    BR->>S3: ListObjects docs/
    BR->>VK: SET docs:registry:* + SADD index
    BR->>S3: Read daily-reports/ (30 days)
    BR->>VK: SET report:text:*
    BR->>ECS: POST /admin/build-search-index
    ECS->>VK: SET search:index:* (12 languages)
    BR->>VK: PING + DBSIZE (verify)

    Note over VK: Full operational state restored

    ECS->>VK: Containers auto-resume writes
        

7.1 Scenario: Valkey Node Replacement

When ElastiCache replaces the node (maintenance, failure, or manual action):

  1. Immediate effect: All ECS containers detect connection failure. RedisClient calls timeout (3s) and return errors. Application falls through to Aurora/sync.Map for all Class A keys.
  2. Auto-recovery (containers): go-redis has MaxRetries: 3 and auto-reconnects. Once the new node is available, existing connections fail but new ones succeed. No container restart needed.
  3. Data recovery: Run BeastReconciler with FORCE_FULL_SYNC=true. Duration: ~4 minutes for 39K usage logs + all supporting data.
  4. Verification: bash scripts/kcc.sh valkey-health — confirms DBSIZE meets baseline (~42,000 keys).

7.2 Scenario: Valkey Temporarily Unreachable (Network/VPC Issue)

  1. During outage: Price API continues normally. LRS reports return 503. Search returns empty or stale cache. Honeypot stops accumulating.
  2. On reconnect: Everything auto-resumes. No sync needed — data accumulated during the outage is in Aurora (via SQS → queued-writer). Next nightly sync backfills the gap in Valkey.
  3. Optional: If LRS reports are urgently needed, trigger an incremental BeastReconciler run (no FORCE_FULL_SYNC needed — it detects existing high-water mark).

7.3 Scenario: Valkey Data Corruption (Partial Key Loss)

  1. Detect: DBSIZE below baseline, or specific subsystem returning unexpected errors.
  2. Diagnose: Check which key families are missing: EXISTS usage_logs:index, EXISTS app:config, DBSIZE.
  3. Fix: Run targeted recovery or full BeastReconciler run depending on scope. BeastReconciler is idempotent — running it on a partially populated Valkey is safe.

7.4 Recovery Time Objectives

ScenarioTime to Full RecoveryCustomer Impact During
Node replacement (planned)~5 minutes (new node + sync)None (price API unaffected)
Network blip (< 60s)Instant (auto-reconnect)None
Extended outage (> 5 min)Immediate on return + sync for gap fillLRS reports unavailable
Full data loss (cold start)~4 minutes (BeastReconciler full run)LRS + search unavailable until reconciliation completes

8. Health Monitoring & Alerting

8.1 CloudWatch Alarms (Active)

AlarmMetricThresholdAction
Trinity-Beast-ElastiCache-CPU-HighCPUUtilization> 80% for 5 minSNS → AutoOps notify
Trinity-Beast-ElastiCache-Memory-HighDatabaseMemoryUsagePercentage> 80% for 5 minSNS → AutoOps notify

8.2 Application-Level Health Checks

8.3 Baseline Metrics (Healthy State)

MetricExpected RangeConcern If
DBSIZE (total keys)40,000 – 45,000< 35,000 (data loss) or > 60,000 (leak)
Memory Usage< 1% of 52 GB> 5% (unexpected growth)
CPU2–5%> 20% (hot key or pipeline issue)
Hit Rate90–95%< 80% (cold cache or miss pattern)
Connected Clients~1,500 (5 containers × 300)< 500 (containers down) or > 2,000 (leak)

9. Validation Results — First Production Run

June 3, 2026 — BeastReconciler validated in production. Full state reconciliation of 43,065 keys completed in 2.36 seconds. Zero errors. All 15 responsibilities executed successfully. The system can recover from a complete Valkey cold start in under 3 seconds.

9.1 Test Conditions

9.2 Results by Function

FunctionResultDetails
syncHistorical40,571 usage logs loaded (93 days from Aurora)
pruneOldData0 pruned (all within retention window)
syncReportUsageLogs0 new (high-water mark current)
syncAPIKeys7 active keys written to Valkey hashes
syncAppParams86 parameters → app:config hash
syncErrorMessages252 messages → errmsg:{lang}:{key}
syncTranslationCostPerChunk40 params → tx:params. Cost averages: Haiku $0.0327/chunk (1 pair, clamped to floor), Sonnet 4.6 $0.0570/chunk (230 pairs). Blended: $0.0569/chunk.
syncTranslationLogs243 translation jobs synced with full indexing (global, per-key, per-doc, per-lang, per-model)
generateDailyReportText25 reports already in Valkey (idempotent — no rework)
pruneReportManifest166 entries, all within 30-day window
rebuildSearchIndexAccepted (HTTP 202 — builds asynchronously in background)
syncRhemaKnowledge37,260 bytes read from S3, written to autoops:support:knowledge:b64
syncDocRegistry42 documents scanned from S3 docs/ prefix. All 42 already had registry entries (idempotent).
verifyValkeyHealthHEALTHY — DBSIZE: 43,065 keys | Memory: 158.44 MB (0.3% of 52 GB capacity)

9.3 Performance

Total Duration
2.36s
Keys Reconciled
43,065
Errors
0
Memory Used
158 MB
Capacity Used
0.3%
Functions Passed
15 / 15

9.4 Key Observations

Conclusion: BeastReconciler is production-ready. A complete Valkey node loss and replacement can be recovered in under 5 minutes (container startup + 2.4s reconciliation). The price API is unaffected throughout — zero customer impact during cache-layer disaster recovery.

10. Retired Key Families

Key families that are being phased out or have been removed:

Key PatternStatusReasonRetirement Date
lang:{code}DeprecatedBeing replaced by pre-rendered translated pages (same folder pattern as doc library: /{lang}/page.html). The i18n JSON API becomes unnecessary when every page exists as a static translated file. Language selection becomes a simple path prefix redirect based on localStorage('cpmp-lang').Pending (web page translation batches 3-5)
i18n:job:{uuid}DeprecatedPart of the JSON i18n system being retired alongside lang:{code}.Pending
search:index (legacy, no lang suffix)RemovedReplaced by per-language indexes search:index:{lang}.May 2026

Architecture Decision (June 3, 2026): The lang:* keys and the /public/lang/{code} API endpoint will be retired entirely once all 33 web pages are translated via the translation engine and deployed to language subfolders on S3. At that point, language selection routes users directly to pre-rendered static HTML — no API calls, no Valkey reads, no runtime text swapping. The JSON i18n system was a bridge; the translated pages are the destination.