The Trinity Beast — Stress Test Plan v15

1. Test Philosophy — What Changed
2. Infrastructure Under Test
3. Progressive Load Ramp (13 Levels)
4. Real-Time Telemetry & Cluster Aggregation
Diagram 4.1 — Telemetry Pipeline
5. Test Matrix — 7 Tests, 2 Phases
Diagram 5.1 — Test Flow
6. Phase 1 — Single Container Ceiling
7. Phase 2 — Production Topology
8. Application Parameter Profiles
9. Race Day Monitoring
10. Pre-Flight & Post-Test Checklists
11. Engine Optimizations (v5 → v6)
12. Estimated Timeline

1. Test Philosophy — What Changed

The v4.1 plan had 12 tests across 3 phases — LPO Only, LRS Only, and Combined — each tested across 4 topologies (TCP Direct, TCP ALB, UDP Direct, UDP NLB). That was 12 tests requiring 4 infrastructure reconfigurations and 4.5 hours.

The v5.0 plan eliminates redundancy by recognizing three facts:

Production always runs APP_REPORT_SERVER — testing LRS in isolation tells you a ceiling you'll never hit in production. The Combined phase already covers LRS under real-world conditions.
NLB is Layer 4 with near-zero overhead — if you know the single-node UDP ceiling, the NLB ceiling is ~3× that. A separate direct-vs-NLB comparison adds time without insight.
Public vs Partner is the real distinction — not LPO vs LRS. Public subscribers hit rate limiting, monthly caps, and TLS termination. Partners bypass all of that. These are the two paths worth measuring.

Result: 7 targeted tests in 2 phases. ~2.5 hours instead of 4.5. Every test answers a specific question about a real access path. Zero redundancy.

Public vs Partner Access Paths

Characteristic	Public Subscriber	AWS Partner
Protocol	TCP (HTTPS via ALB) or UDP (via NLB)	TCP (PrivateLink) or UDP (VPC Peering / NLB)
Rate Limiting	Enforced — QPS + burst + monthly cap	Bypassed — zero rate limiting, zero caps
TLS	ALB terminates TLS (adds ~1-2ms)	PrivateLink or direct — no TLS overhead
Billing Checks	Monthly usage validated per request	Skipped entirely
API Key	Required — identifies account for billing	Required — identifies partner for tracking and analytics
Connection	Public internet → CloudFront → ALB	AWS backbone → PrivateLink / VPC Peering

Partner API Keys: Partners are not required to authenticate by the exchanges — but we require an API key for every partner. This is not a restriction. It is visibility. You cannot manage what you cannot measure. A partner key with zero rate limits and zero billing still gives us per-partner usage tracking, analytics, and the ability to identify issues. That is good engineering.

2. Infrastructure Under Test

ECS Containers

vCPU / Container

RAM / Container

32 GB

ElastiCache

52 GB

Aurora ACU

2–18

Stress Client

96 vCPU

Component	Specification	Role
ECS Fargate	3 × 8 vCPU / 32 GB RAM (APP_REPORT_SERVER)	Application containers — LPO + LRS combined
ElastiCache	cache.r7g.2xlarge, Valkey 7.2, TLS, 52 GB	Price cache, usage log indexes, cluster stats
Aurora Serverless v2	PostgreSQL 17.7, Optimized I/O, 2–18 ACU	Source of truth — API keys, usage logs, parameters
ALB	Trinity-Beast-TCP-ALB	TCP load balancing (443 → 8080/9090)
NLB	Trinity-Beast-UDP-NLB	UDP load balancing (2679/2680)
Stress Client	c6i.metal or equivalent (96 vCPU)	Load generator — same region (us-east-2)

Hot Path: Price requests are served from an in-process sync.Map populated by 6 persistent WebSocket feeds (Coinbase, Gemini, Kraken, Gate.io, Bybit, OKX). Zero network calls on the hot path. ElastiCache is the second layer (sub-millisecond). REST API fallback is the third layer (cache miss only). Under stress testing with 300s cache TTL, 99%+ of requests hit the sync.Map.

3. Progressive Load Ramp (13 Levels)

Each test follows a 13-level progressive load ramp. Concurrency doubles at each level while request volume scales proportionally. This reveals the exact concurrency threshold where performance degrades.

Level	Requests	Concurrent	Purpose
1	300	30	Warm-up — connection pool initialization
2	900	90	Light load — verify cold-start fix
3	3,000	300	Moderate load — baseline throughput
4	9,000	600	Sustained load — batch pipeline under pressure
5	30,000	900	High load — entering sweet spot
6	90,000	1,500	Heavy load — peak throughput zone
7	300,000	3,000	Extreme load — maximum RPS target
8	600,000	6,000	Overload — testing graceful degradation
9	900,000	9,000	Severe overload — success rate threshold
10	1,500,000	12,000	Breaking point — where failures begin
11	3,000,000	15,000	Recovery test — can the system stabilize?
12	6,000,000	18,000	Endurance — sustained extreme load
13	9,000,000	21,000	Absolute ceiling — maximum concurrent connections

Success Criteria: 100% success rate through level 9 (9,000 concurrent). Graceful degradation above that. p50 latency under 10ms for cache hits. p99 under 300ms through the sweet spot (levels 5–8). Zero 5xx errors through level 9.

4. Real-Time Telemetry & Cluster Aggregation

Every container runs 24 atomic.Int64 counters that track the complete request lifecycle. The /admin/stress-stats endpoint returns a per-container snapshot. The /admin/cluster-stats endpoint reads all 3 snapshots from ElastiCache in a single pipeline call — one round-trip, sub-millisecond, guaranteed all 3 containers.

Throughput

tcp_rps, udp_rps, total_rps — real-time requests per second by protocol

🎯 Cache Performance

syncmap_hit_pct, cache_hit_pct — three-layer visibility. sync.Map should be 99%+ at 300s TTL

UDP Health

udp_drop_pct, packets_received vs packets_sent — packet loss visibility

Background Pool

bg_drop_pct, submitted vs completed — housekeeping saturation

💾 SQS Pipeline

rows_queued — SQS messages sent (usage log entries queued for Lambda consumer)

🔌 DB Connections

db_open, db_in_use, db_wait_count — connection pool utilization

Diagram 4.1 — Telemetry Pipeline

graph LR
    subgraph Containers
        M[BeastMain
24 counters] -->|"every 3s"| EC
        Mi[BeastMirror
24 counters] -->|"every 3s"| EC
        L[BeastLRS
24 counters] -->|"every 3s"| EC
    end
    EC[(ElastiCache
cluster:stats:*
TTL 30s)]
    EC -->|"1 pipeline read"| CS[/admin/cluster-stats/]
    CS --> TBCC[Command Center
Cluster Health]
    CS --> CW[CloudWatch
6 Metrics]

    style M fill:#334155,stroke:#64748b,color:#94a3b8
    style Mi fill:#334155,stroke:#64748b,color:#94a3b8
    style L fill:#334155,stroke:#64748b,color:#94a3b8
    style EC fill:#5a3a3a,stroke:#8a5a5a,color:#e2c8c8
    style CS fill:#2d5a4a,stroke:#4a9a7a,color:#cbd5e1
    style TBCC fill:#2d4a6f,stroke:#4a7ab5,color:#cbd5e1
    style CW fill:#3d3a5c,stroke:#6b6399,color:#cbd5e1

The Tuning Loop: Poll /admin/cluster-stats every 3 seconds during testing. If udp_drop_pct exceeds 1%, increase worker pool capacity. If bg_drop_pct exceeds 5%, relax flush intervals. If db_in_use approaches db_max_open, the connection pool is saturating. Each metric points to a specific application parameter — most adjustable without a restart via /admin/system-mode.

5. Test Matrix — 7 Tests, 2 Phases

Test	Protocol	Access Path	Topology	What It Measures
Phase 1 — Single Container Ceiling (Direct, No Load Balancer)
1a	TCP	—	Direct to 1 node	Absolute single-container TCP ceiling
1b	UDP	—	Direct to 1 node	Absolute single-container UDP ceiling (v6 engine)
Phase 2 — Production Topology (3 Nodes, APP_REPORT_SERVER, Load Balanced)
2a	TCP via ALB	Public	3 nodes, distributed AZs	Production TCP throughput (subscribers, rate limited)
2b	UDP via NLB	Partner	3 nodes, distributed AZs	Production UDP throughput (partners, no rate limit)
2c	UDP via NLB	Public	3 nodes, distributed AZs	Production UDP throughput (subscribers, rate limited)
2d	TCP via ALB	Combined	3 nodes, distributed AZs	LPO + LRS simultaneous load (resource contention)
2e	TCP via ALB	Endurance	3 nodes, distributed AZs	30-minute sustained load at 80% of ceiling

Diagram 5.1 — Test Flow

graph TD
    START[Launch Stress Client
96 vCPU] --> P1[Phase 1: Single Container]
    P1 --> T1A[1a: TCP Direct
Raw TCP ceiling]
    P1 --> T1B[1b: UDP Direct
v6 engine ceiling]
    T1A --> RECONFIG[Reconfigure:
Distribute AZs
Enable Governor]
    T1B --> RECONFIG
    RECONFIG --> P2[Phase 2: Production Topology]
    P2 --> T2A[2a: TCP ALB
Public subscribers]
    P2 --> T2B[2b: UDP NLB
Partner — no rate limit]
    P2 --> T2C[2c: UDP NLB
Public — rate limited]
    T2A --> T2D[2d: Combined
LPO + LRS simultaneous]
    T2B --> T2D
    T2C --> T2D
    T2D --> T2E[2e: Endurance
30 min at 80% ceiling]
    T2E --> RESTORE[Restore Production
fresh-price profile]

    style START fill:#4a5568,stroke:#718096,color:#e2e8f0
    style P1 fill:#5a4a2d,stroke:#9a7a4a,color:#cbd5e1
    style P2 fill:#2d4a6f,stroke:#4a7ab5,color:#cbd5e1
    style RECONFIG fill:#3d3a5c,stroke:#6b6399,color:#cbd5e1
    style RESTORE fill:#2d5a4a,stroke:#4a9a7a,color:#cbd5e1
    style T1A fill:#334155,stroke:#64748b,color:#94a3b8
    style T1B fill:#334155,stroke:#64748b,color:#94a3b8
    style T2A fill:#334155,stroke:#64748b,color:#94a3b8
    style T2B fill:#334155,stroke:#64748b,color:#94a3b8
    style T2C fill:#334155,stroke:#64748b,color:#94a3b8
    style T2D fill:#334155,stroke:#64748b,color:#94a3b8
    style T2E fill:#334155,stroke:#64748b,color:#94a3b8

6. Phase 1 — Single Container Ceiling

Isolates a single container to find the raw per-node ceiling for each protocol. No load balancer, no TLS termination, no cross-AZ latency. Governor disabled. Stress client connects directly to the container IP. This establishes the baseline that Phase 2 builds on.

TCP Direct — Single Node

Profile: stress-tcp-direct | AZ: all in 2a | Governor: DISABLED | Est: 15 min

Measures the absolute single-container TCP ceiling. Stress client connects directly to the container IP on port 8080, bypassing the ALB. This is the purest throughput test — no load balancer overhead, no TLS termination.

Previous result (v5): 132K RPS at 3,000 concurrent, 100% success through 9,000 concurrent, p50=4.9ms. This is the benchmark to beat with the v6 binary.

Key metrics to watch: tcp_rps, syncmap_hit_pct (should be 99%+), db_in_use (should stay well below db_max_open), batch_flush_errors (should be 0).

UDP Direct — Single Node (v6 Engine)

Profile: stress-udp-direct | AZ: all in 2a | Governor: DISABLED | Est: 15 min

The most anticipated test. v5 peaked at 44K RPS — bottlenecked by json.Marshal and single-socket WriteToUDP serialization. v6 introduces zero-alloc response building and multi-socket architecture. Target: 80K+ RPS.

Key metrics to watch: udp_rps, udp_drop_pct (the critical number — above 1% means workers can't keep up), bg_drop_pct (housekeeping saturation), CPU% in Container Insights.

7. Phase 2 — Production Topology

All 3 containers running APP_REPORT_SERVER (LPO + LRS), distributed across AZs (2a/2b/2c), governor enabled. This is the real-world configuration. Every test in this phase answers a question about how the system performs under production conditions.

TCP via ALB — Public Subscribers

Profile: stress-tcp-alb | AZs: a/b/c | Governor: ENABLED | API Key: public tier | Est: 15 min

Production TCP throughput for public subscribers. Traffic flows through the ALB with TLS termination, rate limiting active, monthly cap checks enforced. This is the number that goes on the marketing page — what a subscriber actually experiences.

Key metrics to watch: total_rps across all 3 nodes (via /admin/cluster-stats), rate_limit_hits (should be 0 with stress tier key), ALB TargetResponseTime p99.

UDP via NLB — Partner Access (No Rate Limit)

Profile: stress-udp-nlb | AZs: a/b/c | Governor: ENABLED | API Key: partner tier | Est: 15 min

Production UDP throughput for AWS Partners. Partner API keys bypass all rate limiting, monthly caps, and billing checks — both in the TCP and UDP handlers. NLB operates at Layer 4 with near-zero added latency. This measures the fastest possible path through The Trinity Beast.

Key metrics to watch: udp_rps cluster-wide, udp_drop_pct per node, rate_limit_hits (should be exactly 0 — partner keys skip the limiter).

UDP via NLB — Public Subscribers (Rate Limited)

Profile: stress-udp-nlb | AZs: a/b/c | Governor: ENABLED | API Key: public tier | Est: 15 min

Same infrastructure as 2b, but with a public-tier API key. Rate limiting and monthly cap checks are active on every packet. This measures the exact overhead of rate limiting on the UDP path — compare udp_rps against test 2b to quantify the cost.

Key metrics to watch: udp_rps (compare to 2b), rate_limit_hits (should be 0 with stress-tier QPS), the delta between 2b and 2c reveals the per-packet cost of rate limit checks.

Combined Load — LPO + LRS Simultaneous

Profile: stress-combined-alb | AZs: a/b/c | Governor: ENABLED | Est: 20 min

The resource contention test. Two stress clients run simultaneously — one hammering /price (LPO) and one hammering /reports/usage (LRS). Both services compete for the same CPU, memory, DB connections, and cache pool. The gap between this number and test 2a reveals exactly how much LRS overhead costs under load.

Key metrics to watch: tcp_rps + lrs_requests (both should be tracked), db_in_use (combined workload may approach db_max_open=180), batch_flush_errors.

Endurance — 30 Minutes at 80% Ceiling

Profile: stress-tcp-alb | AZs: a/b/c | Governor: ENABLED | Est: 35 min

Sustained load at 80% of the ceiling found in test 2a. Runs for 30 continuous minutes. This tests for memory leaks, connection pool exhaustion, goroutine accumulation, cache TTL edge cases, and Aurora ACU scaling behavior over time. If the system is stable at 80% for 30 minutes, it's production-ready.

Key metrics to watch: Memory% trend in Container Insights (should be flat, not climbing), db_wait_count (should stay at 0), Aurora ACU (should stabilize, not keep climbing), errors_5xx (must remain 0 for the full 30 minutes).

8. Application Parameter Profiles

Streamlined from 16 profiles to 8. Each test uses a profile specifically optimized for its protocol and topology. Profiles are applied instantly via /admin/system-mode?mode=<name>.

Key Tuning Principles

db_max_idle = db_max_open — eliminates connection churn (the fix that dropped p99 from 1,266ms to 8.9ms)
cache_min_idle ≈ 33% of cache_pool_size — optimal pre-warm ratio
All pool sizes are multiples of 3 — one pool per container, evenly distributed
UDP profiles use batch_size=100 — smaller batches reduce GC pressure, freeing CPU for packet processing
Direct profiles use longer flush intervals — single container handles all writes
ALB/NLB profiles use smaller pools — load distributed across 3 containers
Combined profile uses db_max_open=180 — extra headroom for LPO + LRS sharing connections

Profile Matrix

Profile	Test	Batch	uBat	Flush ms	DBOpen	DBIdle	CPool	CIdle	CRdMs
`stress-tcp-direct`	1a	300	500	2,000	150	150	2,997	999	500
`stress-udp-direct`	1b	100	100	3,000	150	150	2,997	999	500
`stress-tcp-alb`	2a, 2e	300	500	500	150	150	1,998	666	500
`stress-udp-nlb`	2b, 2c	100	100	1,500	150	150	1,998	666	500
`stress-combined-alb`	2d	300	500	500	180	180	1,998	666	500

All stress profiles share: QPS=100K, Burst=100K, TTL=300s, log_level=error, config_poll=300, cache_max_retries=1, cache_dial_ms=500, cache_write_ms=500.

9. Race Day Monitoring

Five data sources monitored simultaneously during each test. No X-Ray (adds latency). All monitoring is passive — zero impact on the system under test.

Source	What to Watch	Red Flag
Container Insights	CPU%, Memory%, NetworkTx/Rx per container	CPU > 90%, Memory climbing (leak), Rx >> Tx (dropping packets)
Aurora Performance Insights	ACU usage, active sessions, top SQL, wait events	ACU > 10, sessions = max_open, IO:DataFileRead
ElastiCache Metrics	EngineCPU, CurrConnections, CacheHits/Misses	EngineCPU > 60% (single-threaded), Misses spiking
ALB/NLB Metrics	TargetResponseTime, 5xx count, ActiveConnections	Any 5xx, ResponseTime p99 > 500ms, UnhealthyHostCount > 0
/admin/cluster-stats	All 24 counters aggregated across 3 nodes	syncmap_hit < 99%, udp_drop > 1%, bg_drop > 5%, flush_errors > 0

New in v5.0: The /admin/cluster-stats endpoint reads all 3 container snapshots from ElastiCache in a single pipeline call. No more polling individual containers through the ALB. One call, sub-millisecond, guaranteed all 3 nodes.

10. Pre-Flight & Post-Test Checklists

Pre-Flight (Before Each Test)

Step	Command / Action
Apply profile	`/admin/system-mode?mode=<profile>`
Reset metrics	`/admin/stress-reset`
Verify reset	`/admin/cluster-stats` — confirm all counters are zero across all 3 nodes
Governor setting	Disabled for Phase 1 (direct), enabled for Phase 2 (production)
Trim usage_logs	Keep under 100K rows to avoid sync interference
Open dashboards	Container Insights, Aurora PI, ElastiCache, ALB/NLB metrics

Post-Test (After All Tests Complete)

Step	Action
Restore AZs	Main=2a, Mirror=2b, LRS=2c
Apply production profile	`/admin/system-mode?mode=fresh-price`
Re-enable governor	`adaptive_enabled=true`
Trim usage_logs	Remove stress test rows
Terminate stress client	EC2 instance termination
Record results	Update Performance Report document

11. Engine Optimizations (v5 → v6)

The v6 engine addresses every bottleneck identified during v5 testing. These are code-level changes targeting the CPU-bound UDP hot path.

Optimization	v5	v6	Expected Impact
Response serialization	`json.Marshal` — reflection, interface boxing	`buildUDPResponse()` — direct byte append, pooled buffers	~70% faster, zero heap allocations
Socket architecture	Single shared `net.UDPConn`	One socket per reader goroutine	3× write parallelism
Worker pools	Shared across all readers	Per-socket pool with dedicated channel	Zero cross-socket contention
Buffer management	sync.Pool for read buffers only	sync.Pool for both read and response buffers	Reduced GC pressure
Rate limiting	Not enforced on UDP	Full rate limiting + monthly limits on UDP	Production-ready UDP security

12. Estimated Timeline

Total estimated time: ~2.5 hours including one infrastructure reconfiguration between phases.

Block	Activity	Duration
Setup	Launch stress client EC2 (96 vCPU), consolidate AZs to 2a, apply stress profile, open dashboards	20 min
Phase 1 — Single Container Ceiling
1a	TCP Direct — single-container TCP ceiling	15 min
1b	UDP Direct — v6 engine UDP ceiling	15 min
—	Reconfigure: distribute AZs (a/b/c), enable governor, restart for ALB/NLB pool sizes	10 min
Phase 2 — Production Topology
2a	TCP via ALB — public subscriber throughput	15 min
2b	UDP via NLB — partner access (no rate limit)	15 min
2c	UDP via NLB — public subscriber (rate limited)	15 min
2d	Combined — LPO + LRS simultaneous load	20 min
2e	Endurance — 30 min sustained at 80% ceiling	35 min
Restore Production
—	Restore AZs, fresh-price profile, terminate stress client, record results	15 min
	Total Estimated Time	~2.5 hours

Only 1 infrastructure reconfiguration — between Phase 1 (direct, single AZ) and Phase 2 (distributed, 3 AZs). All Phase 2 tests share the same topology. Compare this to v4.1 which required 4 reconfigurations across 12 tests.

The Trinity Beast — Stress Test Plan v5.0

Table of Contents