Connection pooling, three-tier cache strategies, GC tuning, rate limiter configuration, ALB settings, and cost optimization.
The following optimizations were implemented and validated during the v3.3 stress test session. Each change was tested under sustained load with the compiled Go stress client.
| Optimization | Before (v3.0) | After (v3.3) | Impact |
|---|---|---|---|
| Container CPU | 2 vCPU / 8 GB | 8 vCPU / 32 GB | 4x throughput, no CPU saturation |
| Aurora ACU ceiling | 6 | 18 | Supports 193K req/sec |
| GC tuning | GOGC=200 | GOGC=300 | Fewer GC pauses under load |
| Worker pool | 500 slots | 999 slots | More background work capacity |
| ElastiCache pool | 50 connections | 300 connections, 60 min idle | No pool exhaustion under load |
| UDP readers | 1 per socket | 3 per socket | Parallel packet intake |
| UDP buffers | OS default (~200KB) | 8MB read + 8MB write | No packet loss at high throughput |
| UDP cache | No ElastiCache tier | Full 3-tier (sync.Map → ElastiCache → REST) | Matches TCP cache architecture |
| Batch writes | 500 rows / 10s (bursty) | 300 rows / 100ms micro-batch (smooth) | Aurora ACU spikes eliminated |
| Test client | Python (GIL-bound, ~200 req/sec UDP) | Compiled Go (487K+ req/sec UDP) | Accurate server benchmarking |
ALB optimized for connection reuse, faster deregistration, and security hardening. Deployed April 26, 2026.
| Setting | Before | After | Impact |
|---|---|---|---|
| Idle timeout | 300s (5 min) | 60s | Frees connection slots 5x faster |
| Client keep-alive | 3600s (1 hr) | 120s | Clients reconnect every 2 min instead of hoarding |
| Deregistration delay (both TGs) | 30s | 10s | Deploys drain 20s faster per service |
| LRS routing algorithm | round_robin | least_outstanding_requests | Smarter load distribution, matches LPO |
| Drop invalid headers | disabled | enabled | Security hardening — malformed headers rejected at ALB |
All sequential HGetAll loops replaced with single-round-trip pipelines across 6 handler locations. Deployed April 26, 2026.
// PipelineHGetAll — one round trip instead of N sequential calls
pipe := client.Pipeline()
cmds := make([]*redis.MapStringStringCmd, len(ids))
for i, id := range ids {
cmds[i] = pipe.HGetAll(ctx, fmt.Sprintf("usage_log:%s", id))
}
pipe.Exec(ctx) // single round trip for all N hashes
Locations pipelined: LRS Usage Report, LRS Summary Report, LRS Report Usage Detail, LRS Report Usage Summary, UDP Summary, UDP Usage — all 6 sequential loops converted.
Impact: A report returning 50 rows now makes 1 ElastiCache round trip instead of 50. 30-40% latency reduction on LRS reports.
This optimization was designed for the REST polling era. It no longer applies — all 150 assets are now served by 6 persistent WebSocket feeds that push prices in real-time.
| Exchange | Assets | Feed Type | Latency |
|---|---|---|---|
| Coinbase | BTC, ETH, SOL, DOGE, XRP, LINK, DOT, LTC, AVAX, UNI, PEPE, XLM | WebSocket (push) | 0ms (in-memory) |
| Gemini | AAVE, ADA, MATIC, ATOM, NEAR, ARB, MKR, CRV, GRT, FIL, SHIB, BAT | WebSocket (push) | 0ms (in-memory) |
| Kraken | NANO, SC, LSK, KAVA, BICO, RARI, OCEAN, CFG, CQT, ALGO, FET, FLOW | WebSocket (push) | 0ms (in-memory) |
| Gate.io | BNB, TRX, APT, SEI, INJ, OP, SUI, VET, HBAR, SAND, MANA, FTM | WebSocket (push) | 0ms (in-memory) |
| Bybit | TON, WLD, APE, BLUR, IMX, ENS, LDO, SNX, COMP, 1INCH, SUSHI, GALA | WebSocket (push) | 0ms (in-memory) |
| OKX | KAS, TIA, JUP, STRK, PYTH, W, ZRO, PENDLE, ONDO, RENDER, WIF, FLOKI | WebSocket (push) | 0ms (in-memory) |
Why it's obsolete: The original proposal called for tiered REST polling intervals (top assets every 5 min, mid every 15 min, low every 30 min) and staggered timing across containers. With 6 WebSocket feeds pushing every trade in real-time, prices arrive before requests — there's nothing to poll and nothing to stagger. PrewarmCache() runs once at startup as a bootstrap, then WebSocket feeds take over permanently. Natural staggering already occurs because each container's 6 WebSocket connections establish at slightly different times during startup.
Monitor Aurora ACU usage and adjust max capacity if needed. Current range is 2–18 ACU.
| Current Load | ACU Range | Action |
|---|---|---|
| Consistently under 5 ACU | 2–18 ACU | ✅ Current — right-sized |
| Spiking to 18 ACU | 2–32 ACU | ⚠️ Increase max to 32 |
| Sustained at 18 ACU | 2–48 ACU | 🚨 Increase max to 48 |
Monitor: CloudWatch metric ServerlessDatabaseCapacity
Scale ECS tasks horizontally when traffic increases. Costs reflect 8 vCPU / 32 GB containers.
| Traffic Level | Main Tasks | Mirror Tasks | LRS Tasks | Monthly Cost |
|---|---|---|---|---|
| Current (Low) | 1 | 1 | 1 | $430 |
| Medium (50K QPS) | 2 | 2 | 1 | $670 |
| High (100K QPS) | 3 | 2 | 2 | $970 |
| Very High (200K QPS) | 5 | 3 | 2 | $1,390 |
Trigger: When CPU > 70% or latency > 100ms consistently
Current node is cache.r7g.2xlarge (52.8 GB). ElastiCache is a pure cache layer — Aurora is the source of truth.
| Node Type | Memory | Throughput | Monthly Cost |
|---|---|---|---|
| cache.r7g.2xlarge (current) | 52.8 GB | 400K ops/sec | $637 |
| cache.r7g.4xlarge | 105 GB | 800K ops/sec | ~$1,274 |
| cache.r7g.2xlarge + replica | 52.8 GB × 2 | 400K ops/sec + read replica | ~$1,274 |
Trigger: When memory > 80% or CPU > 70% consistently
All done ✅ — DB pooling, ElastiCache pooling, batch writes, GC tuning, UDP optimizations, worker pool, and system mode toggle all shipped in v3.3.
ServerlessDatabaseCapacity - Current ACU usage (target: 2-10 ACU normal, up to 18 under stress)DatabaseConnections - Active connections (target: < 450)ReadLatency / WriteLatency - Query performance (target: < 5ms)CPUUtilization - CPU usage (target: < 70%)DatabaseMemoryUsagePercentage - Memory usage (target: < 80%)CacheHitRate - Cache effectiveness (target: > 85%)NetworkBytesIn / NetworkBytesOut - ThroughputCPUUtilization - Task CPU usage (target: < 70%)MemoryUtilization - Task memory usage (target: < 85%)TargetResponseTime - Backend latency (target: < 50ms)RequestCount - Traffic volumeHealthyHostCount - Available targets (target: = desired count)HTTPCode_Target_5XX_Count - Backend errors (target: 0)| Symptom | Likely Cause | Solution |
|---|---|---|
| High latency (> 100ms) | All WebSocket feeds down, REST fallback active | Check WS connections in logs, verify Gemini/Coinbase WS endpoints |
| Low cache hit rate (< 95%) | WebSocket feeds disconnected or stale | Check GEMINI-WS/COINBASE-WS logs, verify network connectivity |
| High CPU on ECS tasks | Too many concurrent requests | Scale horizontally (add more tasks) |
| High memory on ECS tasks | Memory leak or large response caching | Review code for leaks, increase task memory |
| Aurora ACU spiking to max | Heavy database queries or connections | Optimize queries, add connection pooling, increase max ACU |
| Aurora ACU spiking | SQS consumer Lambda batch size too large or too frequent | Adjust Lambda batch size or batching window in the SQS event source mapping |
| ElastiCache CPU high | Too many cache operations | Pipelining deployed ✅ — upgrade node type if still high |
| ElastiCache memory high | Too much cached data | Reduce cache TTL or upgrade node type |
| ALB 5xx errors | Backend tasks unhealthy or overloaded | Check task logs, scale horizontally |
The Trinity Beast Infrastructure v4.7 is battle-tested at scale. Run 17 validated:
Recommendation: The system is production-ready and stress-tested well beyond expected traffic. A 3-year Compute Savings Plan is recommended to lock in cost savings on the 8 vCPU / 32 GB Fargate tasks. The remaining optimization opportunities (prewarm strategy, horizontal scaling) are for future scaling — not critical for current operations.
Run 17 eliminated every bottleneck found during stress testing. v4.7 added the v8 UDP engine (SO_REUSEPORT, recvmmsg), dedicated health servers, and 6-exchange WebSocket feeds — the remaining items are future-proofing for horizontal scale.