The Trinity Beast Infrastructure — CloudWatch Dashboard & Alarm Notifications

Monitoring, Alerting, and Operational Visibility
April 2026 Region: us-east-2 4 Dashboards 14 Alarms 10 Log Groups

1. Overview

The Trinity Beast Infrastructure (TBI) uses Amazon CloudWatch as its centralized monitoring and alerting platform. This guide documents every dashboard, alarm, log group, and notification channel deployed across the system.

Dashboards
4
Alarms
14
Log Groups
10
Retention
30 days

2. Dashboards

Four CloudWatch dashboards provide layered visibility — from real-time application metrics to executive cost summaries.

Dashboard Purpose
Trinity-Beast-Application-Dashboard Primary ops dashboard — LPO, LRS, AWS infra, Lambda, logs
Trinity-Beast-Master-Dashboard Comprehensive view across all services
Trinity-Beast-Cost-Detailed-Dashboard Detailed cost breakdown by service
Trinity-Beast-Cost-Executive-Dashboard Executive cost summary

3. Application Dashboard — Widget Reference

The Trinity-Beast-Application-Dashboard is the primary operational dashboard. It contains widgets organized into six sections covering every layer of the stack.

LPO Section

LPO Widgets 7 Widgets
Widget Type
LPO Requests (per minute)Metric — line graph
Cache Hit Rate (%)Metric — gauge / number
Avg Latency (ms)Metric — line graph
Cache Hits vs MissesMetric — stacked area
Requests by AssetMetric — bar chart
Requests by Source (Exchange)Metric — bar chart
Errors & Source FailoversMetric — line graph

LRS Section

LRS Widgets 4 Widgets
Widget Type
LRS Total RequestsMetric — line graph
LRS Avg Latency (ms)Metric — line graph
LRS Output Format UsageMetric — bar chart
LRS ErrorsMetric — line graph

AWS Infrastructure Section

Infrastructure Widgets 6 Widgets
Widget Type
ECS CPU Utilization (%)Metric — line graph
ECS Memory Utilization (%)Metric — line graph
ALB Response Time & ErrorsMetric — line graph
ElastiCache CPU & Cache Hit RateMetric — line graph
ElastiCache Memory Usage (%)Metric — gauge / number
Aurora Serverless Capacity (ACU)Metric — line graph

Container Logs Section

Log Widgets 4 Widgets
Widget Type
LPO — Main Service LogsLog query
LRS — Report Service LogsLog query
Mirror Service LogsLog query
Sync Job LogsLog query

Lambda Section

Lambda Widgets 7 Widgets
Widget Type
Lambda InvocationsMetric — line graph
Lambda ErrorsMetric — line graph
Lambda Duration (ms)Metric — line graph
Throttles & ConcurrencyMetric — line graph
Receipts by Handler TypeLog widget
Recent Receipts — Handler DetailLog widget
Receipt Lambda LogsLog query

CloudTrail & VPC Section

Audit & Network Widgets 3 Widgets
Widget Type
CloudTrail — Errors & Access DeniedLog query
CloudTrail — ECS & Infrastructure ChangesLog query
VPC Flow Logs — Rejected Traffic (Trinity VPC)Log query

4. Cost Dashboards

Two dedicated cost dashboards provide financial visibility into the Trinity Beast Infrastructure spend.

Trinity-Beast-Cost-Detailed-Dashboard Detailed

Shows a per-service cost breakdown including ECS Fargate, Aurora Serverless, ElastiCache, Lambda, S3, CloudWatch, NAT Gateway, and data transfer. Each service is displayed with daily and monthly cost trends, making it easy to identify which component is driving spend.

Trinity-Beast-Cost-Executive-Dashboard Executive

Provides a high-level monthly cost summary with total spend, month-over-month trends, and projected costs. Designed for stakeholders who need a quick financial snapshot without per-service granularity.

5. CloudWatch Alarms

14 alarms monitor critical infrastructure metrics. All alarms publish to the Trinity-Beast-Critical-Alerts SNS topic, triggering both email and SMS notifications simultaneously.

Load Balancers (2 Alarms)

ALB & NLB Health OK
Alarm Name Metric Namespace Threshold Period Eval Periods State
Trinity-Beast-ALB-UnhealthyTargets UnHealthyHostCount AWS/ApplicationELB >= 1 60s 3 OK
Trinity-Beast-NLB-UnhealthyTargets UnHealthyHostCount AWS/NetworkELB >= 1 60s 3 OK

ECS Services (6 Alarms)

ECS CPU & Task Count OK
Alarm Name Metric Namespace Threshold Period Eval Periods State Notes
Trinity-Beast-ECS-CPU-High CPUUtilization AWS/ECS (main-service) > 80% 300s 2 OK
Trinity-Beast-ECS-CPU-High-Mirror CPUUtilization AWS/ECS (mirror-service) > 80% 300s 2 OK
Trinity-Beast-ECS-CPU-High-LRS CPUUtilization AWS/ECS (lrs-service) > 80% 300s 2 OK
Trinity-Beast-Main-Service-Count-Low RunningTaskCount ECS/ContainerInsights (main) < 1 300s 2 OK TreatMissing: breaching
Trinity-Beast-Mirror-Service-Count-Low RunningTaskCount ECS/ContainerInsights (mirror) < 1 300s 2 OK TreatMissing: breaching
Trinity-Beast-LRS-Service-Count-Low RunningTaskCount ECS/ContainerInsights (lrs) < 1 300s 2 OK TreatMissing: breaching

Aurora (2 Alarms)

Aurora Serverless v2 OK
Alarm Name Metric Namespace Threshold Period Eval Periods State
Trinity-Beast-Aurora-CPU-High CPUUtilization AWS/RDS (trinity-beast-aurora-cluster) > 80% 300s 2 OK
Trinity-Beast-Aurora-Connections-High DatabaseConnections AWS/RDS (trinity-beast-aurora-cluster) > 80 300s 2 OK

ElastiCache (3 Alarms)

ElastiCache for Valkey Mixed State
Alarm Name Metric Namespace Threshold Period Eval Periods State
Trinity-Beast-ElastiCache-CPU-High CPUUtilization AWS/ElastiCache > 80% 300s 2 OK
Trinity-Beast-ElastiCache-Memory-High DatabaseMemoryUsagePercentage AWS/ElastiCache > 85% 300s 2 OK/ALARM
Trinity-Beast-ElastiCache-Connections-High CurrConnections AWS/ElastiCache > 1000 300s 2 OK

S3 (1 Alarm)

S3 Bucket Size OK
Alarm Name Metric Namespace Threshold Period Eval Periods State
Trinity-Beast-S3-Size-Unusual-Growth BucketSizeBytes AWS/S3 > 10 GB 86400s 1 OK

6. SNS Notifications

All 14 CloudWatch alarms route to a single SNS topic that delivers alerts through two channels simultaneously.

Trinity-Beast-Critical-Alerts SNS Topic
Topic Name
Trinity-Beast-Critical-Alerts
Subscriptions
2
Alarms Attached
14
Delivery
Simultaneous
Protocol Endpoint Status
Email Admin@CPMP-Site.org Confirmed
SMS +16156128200 Confirmed

Behavior: When any of the 14 alarms transitions to ALARM state, both email and SMS notifications fire simultaneously. There is no escalation chain — both channels receive every alert.

7. CloudWatch Log Groups

10 log groups capture output from every service layer. All groups are configured with a 30-day retention policy.

Log Group Retention Source
/aws/ecs/trinity-beast 30 days All 3 ECS services (LPO, Mirror, LRS)
/aws/ecs/trinity-beast-sync 30 days Nightly sync job
/ecs/trinity-beast-lpo 30 days Legacy LPO logs
/ecs/trinity-beast-main-task-container-def 30 days Legacy main task logs
/aws/lambda/trinity-beast-receipt 30 days Receipt Lambda
/aws/vpc/trinity-beast-flowlogs 30 days VPC Flow Logs
/aws/cloudtrail/trinity-beast 30 days CloudTrail audit logs
/aws/codebuild/trinity-beast-build 30 days CodeBuild logs
/aws/ecs/containerinsights/trinity-beast-fargate-cluster/performance 30 days Container Insights
RDSOSMetrics 30 days Aurora OS metrics

8. Custom Metrics (TrinityBeast Namespace)

The application publishes custom metrics to two CloudWatch namespaces, providing business-level observability beyond standard AWS metrics.

TrinityBeast/LPO Custom Namespace

Metrics published by the Live Price Oracle service:

Metric Description
RequestsTotal LPO requests received
CacheHitsRequests served from ElastiCache cache
CacheMissesRequests requiring upstream source fetch
ErrorsFailed requests (all error types)
SourceFailoversTimes a primary source failed and secondary was used
AvgLatencyAverage response time in milliseconds
TrinityBeast/LRS Custom Namespace

Metrics published by the Live Report Service:

Metric Description
RequestsTotal LRS report requests
AvgLatencyAverage report generation time in milliseconds
ErrorsFailed report generations
MonthlyLimitExceededRequests rejected due to monthly quota
DailyLimitExceededRequests rejected due to daily quota
AddOnRequestsRequests using add-on quota beyond base plan

9. Alarm Response Playbook

When an alarm fires, use the following runbooks to diagnose and resolve the issue. Each category includes the most common root causes and recommended actions.

ALB/NLB Unhealthy Targets Critical

Alarms: Trinity-Beast-ALB-UnhealthyTargets, Trinity-Beast-NLB-UnhealthyTargets

What it means: One or more ECS tasks are failing health checks from the load balancer.

  1. Check ECS service health in the console — are tasks running or in a crash loop?
  2. Review container logs in /aws/ecs/trinity-beast for startup errors or OOM kills
  3. Verify target group health check path and expected response code
  4. Check if a recent deployment introduced a breaking change
  5. If tasks are running but unhealthy, check application health endpoint directly
ECS CPU High Warning

Alarms: Trinity-Beast-ECS-CPU-High, ECS-CPU-High-Mirror, ECS-CPU-High-LRS

What it means: An ECS service is consuming more than 80% CPU over a sustained period.

  1. Check for a traffic spike — correlate with LPO/LRS request metrics on the Application Dashboard
  2. Consider scaling the service — increase desired task count or adjust auto-scaling thresholds
  3. Check for runaway goroutines or infinite loops in recent deployments
  4. Review Container Insights for per-task CPU breakdown
  5. If sustained, evaluate whether the task CPU allocation (vCPU) needs to be increased
Service Count Low Critical

Alarms: Trinity-Beast-Main-Service-Count-Low, Mirror-Service-Count-Low, LRS-Service-Count-Low

What it means: A container has crashed and no tasks are running for the service. These alarms use TreatMissing: breaching, so missing data also triggers the alarm.

  1. Check ECS service events for task stopped reasons (OOM, exit code, health check failure)
  2. Review container logs for the last running task — look for panic, fatal, or OOM messages
  3. Check if the ECR image exists and is pullable (image pull failures)
  4. Verify the task execution role has required permissions
  5. Manually start a new task if the service is not recovering automatically
Aurora CPU High Warning

Alarm: Trinity-Beast-Aurora-CPU-High

What it means: The Aurora Serverless v2 cluster is consuming more than 80% CPU.

  1. Check for slow queries — use Performance Insights or pg_stat_statements
  2. Verify ACU scaling — is the cluster at max ACU and still under pressure?
  3. Check if the nightly sync job is running and creating batch write pressure
  4. Look for missing indexes on frequently queried columns
  5. Consider increasing the max ACU limit if load is legitimate
Aurora Connections High Warning

Alarm: Trinity-Beast-Aurora-Connections-High

What it means: More than 80 active database connections — approaching the connection limit.

  1. Check connection pool settings in the application — are pools sized correctly?
  2. Look for connection leaks — connections opened but never returned to the pool
  3. Verify that the sync job and Lambda are not opening excessive connections
  4. Consider using RDS Proxy if connection pressure is persistent
  5. Check if a recent deployment changed pool configuration
ElastiCache CPU / Memory / Connections Warning

Alarms: Trinity-Beast-ElastiCache-CPU-High, ElastiCache-Memory-High, ElastiCache-Connections-High

What it means: The ElastiCache cluster is under resource pressure — CPU, memory, or connection count is elevated.

  1. Check for a cache stampede — many cache misses causing simultaneous upstream fetches
  2. Review key eviction metrics — if memory is full, keys are being evicted prematurely
  3. Check connection pool settings in the LPO service — are connections being reused properly?
  4. Look for large keys or hot keys that may be causing uneven load
  5. If memory is consistently high, consider scaling to a larger node type or adding shards
  6. Review TTL settings — are cached items living too long and consuming memory?
S3 Unusual Size Growth Low Priority

Alarm: Trinity-Beast-S3-Size-Unusual-Growth

What it means: The S3 bucket has exceeded 10 GB, which may indicate unexpected data accumulation.

  1. Check for unexpected uploads — review S3 access logs or CloudTrail for PutObject events
  2. Look for log file accumulation — are old log exports or reports piling up?
  3. Verify lifecycle policies are in place to expire or transition old objects
  4. Check if the LRS report output is being stored without cleanup
  5. Review bucket versioning — old versions may be consuming space