The Trinity Beast Infrastructure

Current Status Overview

✅ Already Optimized

Network & Load Balancing

v3.9.3 ALB Connection Tuning: 60s idle timeout, 120s keep-alive, 10s deregistration delay, LOR routing on both target groups, invalid header rejection
v3.9.3 NLB Connection Tuning: Cross-zone load balancing enabled, 10s deregistration delay on both UDP target groups, healthy threshold reduced to 2 (1 min recovery vs 2.5 min)
CORS: Enabled with minimal overhead

Price Feed Architecture

6x WebSocket Price Feeds: Coinbase, Gemini, Kraken, Gate.io, Bybit, OKX — persistent push-based connections, 150 prewarmed assets arrive before requests
Per-Container WebSocket Independence: Each container runs its own 6 WS connections, local-only sync.Map writes (no ElastiCache hammering)
3x REST Fallbacks: Gemini → Coinbase → Kraken with health tracking (only used if all WebSocket feeds are stale)
AWS Backbone Priority: WebSocket feeds checked first (0ms, in-memory), then AWS-hosted REST (Gemini/Coinbase), then internet (Kraken) as last resort
Response-First Architecture: Background logging and metrics — response sent before any write operations

Compute & Runtime

Fargate Tasks: 8 vCPU / 32 GB each — all 3 running APP_REPORT_SERVER across 3 AZs
Go Runtime: Using all CPUs with runtime.GOMAXPROCS(runtime.NumCPU())
Garbage Collection: GOGC=300 (configurable via env var, up from 200)
v3.3 Background Worker Pool: 999 slots (up from 500)
v3.3 System Mode Toggle: Demo/Performance/Debug profiles via /admin/system-mode

Cache & Data Layer

ElastiCache cache.r7g.2xlarge: 52.8 GB cache memory, 400K+ ops/sec capacity, single node, no replica
v3.9.3 ElastiCache Pipelining: All 6 sequential HGetAll loops (4 LRS + 2 UDP) replaced with single-round-trip pipelines via PipelineHGetAll()
ElastiCache API Key Cache: 3-layer lookup (sync.Map → ElastiCache → Aurora) with write-through
ElastiCache App Config: Application parameters read from ElastiCache first, Aurora fallback
Shared Rate Limiting: Atomic Lua script in ElastiCache — all 3 containers share rate limit counters
Real-time Usage Counters: HINCRBY in ElastiCache on every request for instant LRS stats
ElastiCache Connection Pooling: 300 pool size, 60 min idle connections
Aurora Optimized I/O: Unlimited IOPS, no per-I/O charges, 40% cost savings, 2–18 ACU
Database Connection Pooling: Configurable via app params (150 open / 75 idle per container)
v3.3 Micro-Batch Aurora Write Smoothing: 300 rows / 100ms (configurable via app params)

UDP Protocol (v8 Engine)

v8 SO_REUSEPORT: 8 sockets per protocol — per-socket kernel receive queue eliminates buffer bottleneck
v8 recvmmsg Batch Reads: 32 datagrams per syscall (~32× reduction in read syscalls)
v8 Pre-Serialized Response Cache: sync.Map of pre-built byte slices (~2× faster for cache hits)
v8 32 MB Socket Buffers: Per socket (up from 8 MB in v3.3)
v8 1,024 Concurrent Handlers: 8 SO_REUSEPORT sockets × 128 workers per socket
UDP 3-Tier Cache: sync.Map → ElastiCache → REST (matches TCP handler)
v3.3 Compiled Go Stress Test Client: cmd/stress/ in mono repo

Current Performance Metrics

TCP Peak (Direct)

369,600

Combined Sustained

746,374

TCP Avg Latency

0.3ms

UDP Avg Latency

0.2ms

Cache Hit Rate

99%+

WebSocket Feeds

6 Active

ElastiCache Pool

300 conn

Aurora ACU

2–18

Implemented in v3.3 ✅ Shipped

The following optimizations were implemented and validated during the v3.3 stress test session. Each change was tested under sustained load with the compiled Go stress client.

Optimization	Before (v3.0)	After (v3.3)	Impact
Container CPU	2 vCPU / 8 GB	8 vCPU / 32 GB	4x throughput, no CPU saturation
Aurora ACU ceiling	6	18	Supports 193K req/sec
GC tuning	GOGC=200	GOGC=300	Fewer GC pauses under load
Worker pool	500 slots	999 slots	More background work capacity
ElastiCache pool	50 connections	300 connections, 60 min idle	No pool exhaustion under load
UDP readers	1 per socket	3 per socket	Parallel packet intake
UDP buffers	OS default (~200KB)	8MB read + 8MB write	No packet loss at high throughput
UDP cache	No ElastiCache tier	Full 3-tier (sync.Map → ElastiCache → REST)	Matches TCP cache architecture
Batch writes	500 rows / 10s (bursty)	300 rows / 100ms micro-batch (smooth)	Aurora ACU spikes eliminated
Test client	Python (GIL-bound, ~200 req/sec UDP)	Compiled Go (487K+ req/sec UDP)	Accurate server benchmarking

Remaining Optimization Opportunities 🚀 Potential Improvements

1. ALB Connection Settings ✅ DEPLOYED

ALB optimized for connection reuse, faster deregistration, and security hardening. Deployed April 26, 2026.

Setting	Before	After	Impact
Idle timeout	300s (5 min)	60s	Frees connection slots 5x faster
Client keep-alive	3600s (1 hr)	120s	Clients reconnect every 2 min instead of hoarding
Deregistration delay (both TGs)	30s	10s	Deploys drain 20s faster per service
LRS routing algorithm	round_robin	least_outstanding_requests	Smarter load distribution, matches LPO
Drop invalid headers	disabled	enabled	Security hardening — malformed headers rejected at ALB

2. ElastiCache Pipelining ✅ DEPLOYED

All sequential HGetAll loops replaced with single-round-trip pipelines across 6 handler locations. Deployed April 26, 2026.

// PipelineHGetAll — one round trip instead of N sequential calls
pipe := client.Pipeline()
cmds := make([]*redis.MapStringStringCmd, len(ids))
for i, id := range ids {
    cmds[i] = pipe.HGetAll(ctx, fmt.Sprintf("usage_log:%s", id))
}
pipe.Exec(ctx) // single round trip for all N hashes

Locations pipelined: LRS Usage Report, LRS Summary Report, LRS Report Usage Detail, LRS Report Usage Summary, UDP Summary, UDP Usage — all 6 sequential loops converted.

Impact: A report returning 50 rows now makes 1 ElastiCache round trip instead of 50. 30-40% latency reduction on LRS reports.

3. Prewarm Optimization SUPERSEDED

This optimization was designed for the REST polling era. It no longer applies — all 150 assets are now served by 6 persistent WebSocket feeds that push prices in real-time.

Exchange	Assets	Feed Type	Latency
Coinbase	BTC, ETH, SOL, DOGE, XRP, LINK, DOT, LTC, AVAX, UNI, PEPE, XLM	WebSocket (push)	0ms (in-memory)
Gemini	AAVE, ADA, MATIC, ATOM, NEAR, ARB, MKR, CRV, GRT, FIL, SHIB, BAT	WebSocket (push)	0ms (in-memory)
Kraken	NANO, SC, LSK, KAVA, BICO, RARI, OCEAN, CFG, CQT, ALGO, FET, FLOW	WebSocket (push)	0ms (in-memory)
Gate.io	BNB, TRX, APT, SEI, INJ, OP, SUI, VET, HBAR, SAND, MANA, FTM	WebSocket (push)	0ms (in-memory)
Bybit	TON, WLD, APE, BLUR, IMX, ENS, LDO, SNX, COMP, 1INCH, SUSHI, GALA	WebSocket (push)	0ms (in-memory)
OKX	KAS, TIA, JUP, STRK, PYTH, W, ZRO, PENDLE, ONDO, RENDER, WIF, FLOKI	WebSocket (push)	0ms (in-memory)

Why it's obsolete: The original proposal called for tiered REST polling intervals (top assets every 5 min, mid every 15 min, low every 30 min) and staggered timing across containers. With 6 WebSocket feeds pushing every trade in real-time, prices arrive before requests — there's nothing to poll and nothing to stagger. PrewarmCache() runs once at startup as a bootstrap, then WebSocket feeds take over permanently. Natural staggering already occurs because each container's 6 WebSocket connections establish at slightly different times during startup.

4. Aurora Scaling Headroom FUTURE

Monitor Aurora ACU usage and adjust max capacity if needed. Current range is 2–18 ACU.

Current Load	ACU Range	Action
Consistently under 5 ACU	2–18 ACU	✅ Current — right-sized
Spiking to 18 ACU	2–32 ACU	⚠️ Increase max to 32
Sustained at 18 ACU	2–48 ACU	🚨 Increase max to 48

Monitor: CloudWatch metric ServerlessDatabaseCapacity

5. Task Count Scaling FUTURE

Scale ECS tasks horizontally when traffic increases. Costs reflect 8 vCPU / 32 GB containers.

Traffic Level	Main Tasks	Mirror Tasks	LRS Tasks	Monthly Cost
Current (Low)	1	1	1	$430
Medium (50K QPS)	2	2	1	$670
High (100K QPS)	3	2	2	$970
Very High (200K QPS)	5	3	2	$1,390

Trigger: When CPU > 70% or latency > 100ms consistently

6. ElastiCache Scaling FUTURE

Current node is cache.r7g.2xlarge (52.8 GB). ElastiCache is a pure cache layer — Aurora is the source of truth.

Node Type	Memory	Throughput	Monthly Cost
cache.r7g.2xlarge (current)	52.8 GB	400K ops/sec	$637
cache.r7g.4xlarge	105 GB	800K ops/sec	~$1,274
cache.r7g.2xlarge + replica	52.8 GB × 2	400K ops/sec + read replica	~$1,274

Trigger: When memory > 80% or CPU > 70% consistently

🎯 Recommended Priority

Immediate

All done ✅ — DB pooling, ElastiCache pooling, batch writes, GC tuning, UDP optimizations, worker pool, and system mode toggle all shipped in v3.3.

Short Term (Next 1-2 Weeks)

Monitor v3.3 Metrics - CloudWatch dashboards for Aurora ACU, ElastiCache CPU/memory, ALB latency under real traffic
Tune SQS Pipeline Params - Adjust sqs_flush_ms and sqs_buffer_size via app params if queue depth patterns change
~~ALB Connection Settings — ✅ Deployed April 26, 2026~~
~~ElastiCache Pipelining — ✅ Deployed April 26, 2026~~

Long Term (Based on Metrics)

~~Prewarm Strategy — Superseded by 6 real-time WebSocket feeds (150 assets, 0ms latency)~~
Horizontal Scaling - Add tasks when traffic increases
ElastiCache Upgrade - Move to xlarge when ops/sec approaches 100K sustained

Monitoring & Metrics

Key CloudWatch Metrics to Watch

Aurora Serverless v2

ServerlessDatabaseCapacity - Current ACU usage (target: 2-10 ACU normal, up to 18 under stress)
DatabaseConnections - Active connections (target: < 450)
ReadLatency / WriteLatency - Query performance (target: < 5ms)

ElastiCache

CPUUtilization - CPU usage (target: < 70%)
DatabaseMemoryUsagePercentage - Memory usage (target: < 80%)
CacheHitRate - Cache effectiveness (target: > 85%)
NetworkBytesIn / NetworkBytesOut - Throughput

ECS Fargate

CPUUtilization - Task CPU usage (target: < 70%)
MemoryUtilization - Task memory usage (target: < 85%)

Application Load Balancer

TargetResponseTime - Backend latency (target: < 50ms)
RequestCount - Traffic volume
HealthyHostCount - Available targets (target: = desired count)
HTTPCode_Target_5XX_Count - Backend errors (target: 0)

Performance Bottleneck Analysis

Symptom	Likely Cause	Solution
High latency (> 100ms)	All WebSocket feeds down, REST fallback active	Check WS connections in logs, verify Gemini/Coinbase WS endpoints
Low cache hit rate (< 95%)	WebSocket feeds disconnected or stale	Check GEMINI-WS/COINBASE-WS logs, verify network connectivity
High CPU on ECS tasks	Too many concurrent requests	Scale horizontally (add more tasks)
High memory on ECS tasks	Memory leak or large response caching	Review code for leaks, increase task memory
Aurora ACU spiking to max	Heavy database queries or connections	Optimize queries, add connection pooling, increase max ACU
Aurora ACU spiking	SQS consumer Lambda batch size too large or too frequent	Adjust Lambda batch size or batching window in the SQS event source mapping
ElastiCache CPU high	Too many cache operations	Pipelining deployed ✅ — upgrade node type if still high
ElastiCache memory high	Too much cached data	Reduce cache TTL or upgrade node type
ALB 5xx errors	Backend tasks unhealthy or overloaded	Check task logs, scale horizontally

Conclusion

Current Assessment

The Trinity Beast Infrastructure v4.7 is battle-tested at scale. Run 17 validated:

746,374 combined RPS sustained for 30 minutes — 1.34 billion requests with zero degradation
369,600 TCP req/sec and 487,900 UDP req/sec (direct) — 100% success through all 13 concurrency levels
0.3ms TCP avg latency, 0.2ms UDP avg latency
943× improvement from v1.0 baseline across 17 test runs in 19 days
8 vCPU / 32 GB containers — scales from 3 (production) to 9 (proven at scale)
2–18 ACU Aurora range — right-sized with micro-batch write smoothing
6 persistent WebSocket price feeds (Coinbase, Gemini, Kraken, Gate.io, Bybit, OKX) — 150 prewarmed assets
99%+ cache hit rate — virtually every request served from memory
ElastiCache-backed API key validation, shared rate limiting, and real-time usage counters
v8 UDP engine: SO_REUSEPORT, recvmmsg batch reads, pre-serialized response cache

Recommendation: The system is production-ready and stress-tested well beyond expected traffic. A 3-year Compute Savings Plan is recommended to lock in cost savings on the 8 vCPU / 32 GB Fargate tasks. The remaining optimization opportunities (prewarm strategy, horizontal scaling) are for future scaling — not critical for current operations.

Run 17 eliminated every bottleneck found during stress testing. v4.7 added the v8 UDP engine (SO_REUSEPORT, recvmmsg), dedicated health servers, and 6-exchange WebSocket feeds — the remaining items are future-proofing for horizontal scale.