The Trinity Beast Infrastructure — Unified Logs & Observability

Centralized Logging Architecture & Operational Intelligence
June 2026 Region: us-east-2 1 Unified Bucket 30 Log Groups 90-Day Retention < $0.50/mo

1. The Problem — Why This Was Needed

Before June 4, 2026, the Trinity Beast Infrastructure had a logging blind spot. Application code logged to CloudWatch — that part worked. But the surrounding infrastructure was largely invisible:

❌ Before — Scattered & Dark
  • ALB access logs: disabled — no record of individual HTTP requests to the API
  • CloudFront access logs: disabled — no record of website visitor requests
  • WAF request logs: disabled — only sampled CloudWatch metrics (block counts per rule, no request detail)
  • S3 access logs: disabled — no audit of who accessed deployed files
  • Aurora server logs: not exported — slow queries only visible via RDS console download
  • Valkey slow/engine logs: not exported — no visibility into cache performance issues
  • CloudWatch retention: infinite on 18 log groups — unbounded cost growth
  • No single place to look — context switching between console, S3, CloudWatch, API calls
✅ After — Unified & Complete
  • ALB logs: every API request → S3 (latency, status, target, client IP)
  • CloudFront logs: every website hit → S3 (edge location, cache status, bytes)
  • WAF logs: every API request → S3 (full headers, rule match, action, country)
  • S3 access logs: every object operation → S3 (requester, operation, key)
  • Aurora logs: slow queries + errors → CloudWatch (instantly searchable)
  • Valkey logs: slow commands + engine events → CloudWatch
  • All 30 log groups: explicit 90-day retention (controlled cost)
  • One S3 bucket, one lifecycle policy, one search path
The core insight: You cannot learn from a system you cannot observe. Every HTTP request, every database query, every cache miss, every WAF decision — these are the signals that reveal performance bottlenecks, security threats, usage patterns, and optimization opportunities. Without centralized logging, these signals evaporate the moment they occur. With it, they become searchable institutional memory.

What We Couldn't Do Before

What We Can Do Now

2. Architecture — The Unified Logging Model

The logging architecture splits into three tiers based on query pattern and retention needs:

Diagram 2.1 — Unified Logging Architecture — Three-Tier Model
flowchart TB
    %% Sources
    subgraph sources["Log Sources"]
        direction LR
        ALB["ALB
Trinity-Beast-TCP-ALB"] CF["CloudFront
E110PRKEIYQVLL"] WAF["WAF
trinity-beast-api-waf"] S3SRC["S3 Website
trinity-beast-website-east2"] ECS["ECS Containers
Main · Mirror · LRS · Webhook"] LAMBDA["Lambda Functions
14 functions"] AURORA["Aurora PostgreSQL
Writer + Reader"] VALKEY["Valkey 7.2
cache.r7g.2xlarge"] VPC["VPC Flow Logs
2 VPCs"] CT["CloudTrail
Multi-region"] end %% Tier 1 - S3 subgraph tier1["TIER 1 — S3: Long-Term Archive + Bulk Query"] direction TB BUCKET["aws-waf-logs-trinity-beast"] WAF_PREFIX["AWSLogs/.../WAFLogs/
JSON · gzipped · every 5 min"] ALB_PREFIX["alb/
space-delimited · gzipped · every 5 min"] CF_PREFIX["cloudfront/
tab-delimited · gzipped · every few min"] S3_PREFIX["s3-access/
space-delimited · best effort"] LIFECYCLE["Lifecycle: Standard 0-90d → Glacier IR 90-365d → Delete"] end %% Tier 2 - CloudWatch subgraph tier2["TIER 2 — CloudWatch: Real-Time + Instant Search"] direction TB CW_ECS["/aws/ecs/trinity-beast
4 containers · 1 group · UME self-identifies"] CW_LAMBDA["/aws/lambda/*
14 function log groups"] CW_AURORA["/aws/rds/cluster/.../postgresql
Slow queries · Lock waits · Errors"] CW_VALKEY["/aws/elasticache/.../slow-log + engine-log
Slow commands · Engine events"] CW_VPC["/aws/vpc/trinity-beast-flowlogs
Both VPCs · Accept/Reject"] CW_CT["/aws/cloudtrail/trinity-beast
30-day window · S3 has full archive"] CW_INSIGHTS["Container Insights + RDSOSMetrics"] RETENTION["All groups: 90-day retention"] end %% Tier 3 - Application State subgraph tier3["TIER 3 — Valkey + Aurora: Application-Level State"] direction TB VK_OPS["autoops:actions:log · threats:daily
Autonomous actions · Threat summaries"] VK_HONEY["honeypot:log · honeypot:ip:*
Trap hits · Per-IP forensics"] VK_TX["tx:job:* · tx:history
Translation job progress"] VK_REPORT["report:text:* · docs:session:log
Daily reports · Session history"] AU_EVENTS["translation_job_events
usage_logs · support_tickets"] AU_CRON["cron.job_run_details
pg_cron execution history"] end %% Connections - Tier 1 WAF -->|"full request logs"| WAF_PREFIX ALB -->|"access logs"| ALB_PREFIX CF -->|"standard logs"| CF_PREFIX S3SRC -->|"server access"| S3_PREFIX WAF_PREFIX --> BUCKET ALB_PREFIX --> BUCKET CF_PREFIX --> BUCKET S3_PREFIX --> BUCKET BUCKET --> LIFECYCLE %% Connections - Tier 2 ECS -->|"stdout/stderr"| CW_ECS LAMBDA -->|"execution logs"| CW_LAMBDA AURORA -->|"pg logs export"| CW_AURORA VALKEY -->|"slow + engine"| CW_VALKEY VPC -->|"flow logs"| CW_VPC CT -->|"API events"| CW_CT ECS -->|"metrics"| CW_INSIGHTS AURORA -->|"OS metrics"| CW_INSIGHTS %% Connections - Tier 3 ECS -->|"app writes"| VK_OPS ECS -->|"honeypot hits"| VK_HONEY ECS -->|"job state"| VK_TX ECS -->|"reports"| VK_REPORT ECS -->|"audit rows"| AU_EVENTS AURORA -->|"cron results"| AU_CRON %% Also CloudTrail to S3 CT -->|"archive"| BUCKET %% Styling classDef s3Style fill:#1a365d,stroke:#60a5fa,color:#e2e8f0 classDef cwStyle fill:#14532d,stroke:#10b981,color:#e2e8f0 classDef appStyle fill:#4c1d95,stroke:#a78bfa,color:#e2e8f0 classDef sourceStyle fill:#1e293b,stroke:#FF9900,color:#e2e8f0 classDef bucketStyle fill:#0f172a,stroke:#FF9900,color:#FF9900 class WAF_PREFIX,ALB_PREFIX,CF_PREFIX,S3_PREFIX,LIFECYCLE s3Style class CW_ECS,CW_LAMBDA,CW_AURORA,CW_VALKEY,CW_VPC,CW_CT,CW_INSIGHTS,RETENTION cwStyle class VK_OPS,VK_HONEY,VK_TX,VK_REPORT,AU_EVENTS,AU_CRON appStyle class ALB,CF,WAF,S3SRC,ECS,LAMBDA,AURORA,VALKEY,VPC,CT sourceStyle class BUCKET bucketStyle

Request Lifecycle — What Gets Logged Where

A single API request generates log entries across multiple tiers. This diagram traces a request from edge to database, showing where each log event lands:

Diagram 2.2 — Request Lifecycle — Log Entry Points Across Tiers
sequenceDiagram
    participant Client as Client (Browser/SDK)
    participant CF as CloudFront
    participant WAF as WAF
    participant ALB as ALB
    participant ECS as ECS Container
    participant Aurora as Aurora
    participant Valkey as Valkey
    participant S3Log as S3 Logs Bucket

    Note over Client,S3Log: A single GET /price?asset=BTC request

    Client->>CF: HTTPS request
    Note right of CF: 📝 CloudFront log → S3
cloudfront/ prefix
(edge, IP, URI, cache hit/miss) CF->>WAF: Forward to origin Note right of WAF: 📝 WAF log → S3
AWSLogs/.../WAFLogs/
(IP, headers, rule match, ALLOW) WAF->>ALB: Passed rules Note right of ALB: 📝 ALB access log → S3
alb/ prefix
(latency, status, target IP) ALB->>ECS: Route to healthy target Note right of ECS: 📝 Container log → CloudWatch
/aws/ecs/trinity-beast
(UME: cluster_node, endpoint, api_key_id) ECS->>Valkey: Check price cache Note right of Valkey: (no log unless slow command) ECS->>Aurora: Query if cache miss Note right of Aurora: 📝 If >1s → PostgreSQL log
/aws/rds/.../postgresql ECS-->>Client: UME Response (200) Note over Client,S3Log: Result: 4-6 log entries for ONE request across 3-4 destinations

Design Decisions

Why one S3 bucket for all log types?

Single lifecycle policy. Single IAM scope. Single place to point Athena. One less thing to remember. The bucket is named aws-waf-logs-trinity-beast because WAF enforces a naming convention (aws-waf-logs-* prefix required). Rather than fight it, we made the WAF-compliant name the home for everything.

Why WAF logs on S3 instead of CloudWatch?

Cost. CloudWatch Logs ingestion is $0.50/GB. S3 Standard is $0.023/GB/month for storage + $0 for delivery. At ~50 MB/day of WAF logs, S3 saves ~$0.70/month vs CloudWatch. More importantly, S3 logs are Athena-queryable — we can run SQL across millions of WAF events without building a pipeline.

Why container logs don't need separate groups

All four LPO containers (Main, Mirror, LRS, Webhook) run the same binary and log the same way. Every structured log line carries UME fields: cluster_node (which container), agent_profile_arn (which actor), endpoint (what path), ip_address (who called). There is zero ambiguity about origin. Keeping them in one log group means one search covers the entire cluster — no need to check each node separately when hunting for an error or tracing a request.

Why keep CloudWatch for application logs?

Instant search. When a Lambda errors or a container crashes, you need the answer in seconds — not minutes downloading from S3. CloudWatch's filter-log-events gives sub-second search across application logs. The tradeoff is worth it: application logs are low-volume and high-urgency; infrastructure logs are high-volume and low-urgency.

3. S3 Unified Logs Bucket

Bucket Name
aws-waf-logs-trinity-beast
Region
us-east-2
Lifecycle
365d
Encryption
SSE-S3
ACL Mode
BucketOwnerPreferred
Log Sources
4

3.1 Prefix Structure

PrefixSourceFormatDelivery
AWSLogs/211998422884/WAFLogs/us-east-2/trinity-beast-api-waf/WAF (ALB)JSON (gzipped), one event per lineEvery 5 minutes
alb/AWSLogs/211998422884/elasticloadbalancing/us-east-2/ALBSpace-delimited (gzipped)Every 5 minutes
cloudfront/CloudFrontTab-delimited (gzipped), W3C extended formatEvery few minutes
s3-access/S3 Server AccessSpace-delimited (not compressed)Best-effort (minutes)

3.2 What Each Log Contains

WAF Logs (richest security signal)

ALB Access Logs

CloudFront Standard Logs

S3 Server Access Logs

3.3 Bucket Policy

The bucket policy grants write access to four AWS service principals:

PrincipalAccessPath
delivery.logs.amazonaws.comPutObjectAWSLogs/211998422884/* (WAF)
arn:aws:iam::033677994240:rootPutObjectalb/* (ALB, us-east-2 ELB account)
delivery.logs.amazonaws.comPutObjectalb/*, cloudfront/*
logging.s3.amazonaws.comPutObjects3-access/*

3.4 Lifecycle Rules

AgeStorage ClassCost/GB/MonthAccess Speed
0–90 daysS3 Standard$0.023Instant (milliseconds)
90–365 daysGlacier Instant Retrieval$0.004Instant (milliseconds)
>365 daysDeleted

4. CloudWatch Log Groups (Application Logs)

All application-level logs live in CloudWatch for instant search. Every group has explicit 90-day retention — no infinite growth.

4.1 ECS Container Logs

Already unified. All four LPO/LRS/Webhook containers share a single log group (/aws/ecs/trinity-beast). There is no need to separate them — every log line self-identifies via the Unified Messaging Envelope (UME): cluster_node tells you which container (BeastMain, BeastMirror, BeastLRS, BeastWebhook), agent_profile_arn tells you which actor produced it, and endpoint tells you what was being served. Stream prefixes exist for AWS's benefit (task routing), but for searching, you filter on UME fields — not stream names.
ServiceLog GroupStream PrefixUME cluster_node
BeastMain/aws/ecs/trinity-beastmain/BeastMain
BeastMirror/aws/ecs/trinity-beastmirror/BeastMirror
BeastLRS/aws/ecs/trinity-beastlrs/BeastLRS
BeastWebhook/aws/ecs/trinity-beastwebhook/BeastWebhook
BeastTranslate/ecs/tbi-translate-workerservice/tbi-translate-worker/
BeastReconciler/aws/ecs/trinity-beast-syncsync/trinity-beast-sync-job/

The translate worker and sync job have separate log groups because they are different binaries (Python and Go respectively) with different lifecycles — they are not LPO containers. But the four LPO/LRS nodes are the same binary running in parallel, differentiated only by SERVER_TYPE and CLUSTER_NODE environment variables. UME makes separation unnecessary — search across all four at once and filter by the fields that matter.

Diagram 4.1 — Container Log Unification via UME Self-Identification
flowchart LR
    subgraph containers["4 Containers — Same Binary, Same Log Group"]
        M["BeastMain
APP_REPORT_SERVER"] R["BeastMirror
APP_REPORT_SERVER"] L["BeastLRS
APP_REPORT_SERVER"] W["BeastWebhook
WEBHOOK_SERVER"] end subgraph loggroup["/aws/ecs/trinity-beast"] LOG["Unified Stream
All 4 containers interleaved"] end subgraph ume["UME Fields — Self-Identification"] CN["cluster_node:
BeastMain | BeastMirror | BeastLRS | BeastWebhook"] AP["agent_profile_arn:
tbi | webhook-engine | rhema | ..."] EP["endpoint:
/price | /reports | /admin/..."] IP["ip_address:
client source IP"] end M --> LOG R --> LOG L --> LOG W --> LOG LOG --> CN LOG --> AP LOG --> EP LOG --> IP classDef containerStyle fill:#1e293b,stroke:#FF9900,color:#e2e8f0 classDef logStyle fill:#14532d,stroke:#10b981,color:#e2e8f0 classDef umeStyle fill:#1e1b4b,stroke:#a78bfa,color:#e2e8f0 class M,R,L,W containerStyle class LOG logStyle class CN,AP,EP,IP umeStyle
# Search ALL containers at once — UME self-reports which node answered
aws logs filter-log-events --log-group-name "/aws/ecs/trinity-beast" \
  --start-time $(date -v-1H +%s)000 --filter-pattern "BeastMain" \
  --region us-east-2 --query 'events[*].message' --output json

# Find errors across all 4 nodes simultaneously
aws logs filter-log-events --log-group-name "/aws/ecs/trinity-beast" \
  --start-time $(date -v-1H +%s)000 --filter-pattern "ERROR" \
  --region us-east-2 --query 'events[*].message' --output json

# Filter by agent actor (e.g., only webhook-engine responses)
aws logs filter-log-events --log-group-name "/aws/ecs/trinity-beast" \
  --start-time $(date -v-1H +%s)000 --filter-pattern "webhook-engine" \
  --region us-east-2 --query 'events[*].message' --output json

4.2 Lambda Function Logs

FunctionLog GroupPurpose
trinity-beast-receipt/aws/lambda/trinity-beast-receiptStripe receipt processing
trinity-beast-queued-writer/aws/lambda/trinity-beast-queued-writerSQS → Aurora batch inserts
tbi-ops-notify/aws/lambda/tbi-ops-notifyFormatted SES notifications
tbi-ops-self-heal/aws/lambda/tbi-ops-self-healECS task restart automation
tbi-ops-waf-action/aws/lambda/tbi-ops-waf-actionWAF rule management
tbi-ops-honeypot-processor/aws/lambda/tbi-ops-honeypot-processorHoneypot queue → WAF blocks
tbi-ops-bedrock-analyze/aws/lambda/tbi-ops-bedrock-analyzeAI threat correlation
tbi-rhema-support/aws/lambda/tbi-rhema-supportAI support assistant
tbi-ops-digest/aws/lambda/tbi-ops-digestDaily/weekly operational digest
tbi-translate-deploy/aws/lambda/tbi-translate-deployCloudFront invalidation
tbi-translate-finalize/aws/lambda/tbi-translate-finalizeSearch rebuild + notification
tbi-translate-batch-prepare/aws/lambda/tbi-translate-batch-prepareBatch JSONL preparation
tbi-translate-batch-submit/aws/lambda/tbi-translate-batch-submitBedrock batch job submission

4.3 Infrastructure Logs

Log GroupContentRetention
/aws/cloudtrail/trinity-beastAll AWS API calls (multi-region)30 days (S3 archive is indefinite)
/aws/vpc/trinity-beast-flowlogsNetwork traffic for both VPCs90 days
/aws/ecs/containerinsights/.../performanceECS CPU, memory, network per task90 days
RDSOSMetricsAurora OS-level metrics (every 30s)90 days
/aws/rds/cluster/trinity-beast-aurora-cluster/postgresqlSlow queries (>1s), lock waits, errors90 days
/aws/elasticache/trinity-beast-cache/slow-logValkey commands exceeding threshold90 days
/aws/elasticache/trinity-beast-cache/engine-logValkey engine events (startup, failover)90 days

5. Valkey Operational Logs

Beyond CloudWatch export, the application writes structured operational data directly to Valkey as keys. These function as real-time logs for the AutoOps system.

Key PatternTypeWhat It RecordsTTL
autoops:actions:logSorted SetEvery autonomous action (self-heals, WAF blocks, notifications)Permanent
autoops:threats:dailyString (JSON)Today's AI-generated threat summaryOverwritten daily
honeypot:logSorted SetChronological log of all honeypot trap hitsPermanent
honeypot:ip:<ip>HashPer-IP: first_seen, last_seen, hit_count, pathsPermanent
tx:job:<id>HashTranslation job state (live progress per pair)7 days
report:text:YYYY-MM-DDStringPlain-text daily report for newsletter30 days
docs:session:logListSession close entries (date, session, summary)Permanent

6. Aurora Database Logs

6.1 PostgreSQL Server Logs (CloudWatch)

Enabled June 4, 2026. Aurora exports slow queries and errors to CloudWatch automatically. The instance parameter group controls what gets logged:

ParameterValueEffect
log_min_duration_statement1000Log queries taking longer than 1 second
log_temp_files0Log all temp file usage (sorts spilling to disk)
log_lock_waits1Log lock wait events
track_io_timingonInclude I/O timing in EXPLAIN output
idle_in_transaction_session_timeout300000Kill idle-in-transaction sessions after 5 min

6.2 pg_cron Execution History

Scheduled job execution history lives in Aurora itself — not CloudWatch. Query via:

SELECT r.jobid, j.jobname, r.start_time, r.end_time, r.status, r.return_message
FROM cron.job_run_details r JOIN cron.job j ON r.jobid = j.jobid
ORDER BY r.start_time DESC LIMIT 20;

6.3 Application Audit Tables

TablePurposeRetention
translation_job_eventsEvery state transition for every translation jobPermanent
usage_logsEvery API call (raw request log)93 days in Valkey, permanent in Aurora
support_ticketsSupport ticket history1095 days (3 years)
cron.job_run_detailspg_cron execution results~7 days (pg_cron internal cleanup)

7. Retention & Lifecycle Policy

The rule: 90 days for CloudWatch (multiples-of-3 convention: 3 × 30). 365 days for S3 with Glacier transition at 90 days. No infinite-retention log groups exist. Every byte has a known expiration.
TierRetentionStorageApplies To
CloudWatch (Application)90 daysCloudWatch LogsAll 18 Lambda + 4 ECS groups
CloudWatch (Infrastructure)90 daysCloudWatch LogsVPC Flow Logs, Container Insights, RDSOSMetrics, Aurora PG, Valkey
CloudWatch (CloudTrail)30 daysCloudWatch LogsAPI audit (real-time search); S3 archive handles long-term
S3 (Standard)0–90 daysS3 StandardWAF, ALB, CloudFront, S3 access logs
S3 (Glacier IR)90–365 daysGlacier Instant RetrievalSame — automatic transition
S3 (CloudTrail)IndefiniteS3 StandardCloudTrail raw archive (no lifecycle rule)
Valkey7–30 daysIn-memoryOperational state (TTL-managed)
AuroraPermanentAurora storageUsage logs, tickets, job events

8. Querying & Searching Logs

8.1 CloudWatch — Instant Search

# Find errors in any Lambda (last hour)
aws logs filter-log-events --log-group-name "/aws/lambda/tbi-ops-notify" \
  --start-time $(date -v-1H +%s)000 --filter-pattern "ERROR" \
  --region us-east-2 --query 'events[*].message' --output json

# Slow Aurora queries (last 24h)
aws logs filter-log-events \
  --log-group-name "/aws/rds/cluster/trinity-beast-aurora-cluster/postgresql" \
  --start-time $(date -v-24H +%s)000 --filter-pattern "duration" \
  --region us-east-2 --query 'events[*].message' --output json

# ECS container output for sync job (latest run)
STREAM=$(aws logs describe-log-streams --log-group-name "/aws/ecs/trinity-beast-sync" \
  --order-by LastEventTime --descending --limit 1 \
  --region us-east-2 --query 'logStreams[0].logStreamName' --output text)
aws logs get-log-events --log-group-name "/aws/ecs/trinity-beast-sync" \
  --log-stream-name "$STREAM" --limit 30 \
  --region us-east-2 --query 'events[*].message' --output json

8.2 S3 — Download & Analyze

# WAF: Get recent blocked requests
latest=$(aws s3 ls "s3://aws-waf-logs-trinity-beast/AWSLogs/211998422884/WAFLogs/\
us-east-2/trinity-beast-api-waf/$(date -u +%Y/%m/%d/%H)/" --region us-east-2 | tail -1 | awk '{print $4}')
aws s3 cp "s3://aws-waf-logs-trinity-beast/AWSLogs/211998422884/WAFLogs/\
us-east-2/trinity-beast-api-waf/$(date -u +%Y/%m/%d/%H)/$latest" - \
  --region us-east-2 | gunzip | jq -c 'select(.action=="BLOCK") | {ip: .httpRequest.clientIp, uri: .httpRequest.uri, rule: .terminatingRuleId}'

# ALB: Check latency for a specific endpoint
aws s3 cp "s3://aws-waf-logs-trinity-beast/alb/AWSLogs/211998422884/\
elasticloadbalancing/us-east-2/$(date +%Y/%m/%d)/<file>.log.gz" - \
  --region us-east-2 | gunzip | awk '{print $6, $13}' | grep "/price"

# CloudFront: Top requested URIs
aws s3 cp "s3://aws-waf-logs-trinity-beast/cloudfront/<file>.gz" - \
  --region us-east-2 | gunzip | awk '{print $8}' | sort | uniq -c | sort -rn | head -20

8.3 Valkey — Operational State

# Recent autonomous actions
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" -H "Content-Type: application/json" \
  -d '{"command":"ZREVRANGEBYSCORE autoops:actions:log +inf -inf LIMIT 0 10"}' \
  "$LPO_BASE/admin/valkey"

# Today's threat assessment
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" -H "Content-Type: application/json" \
  -d '{"command":"GET autoops:threats:daily"}' "$LPO_BASE/admin/valkey"

# Honeypot activity (last 10 hits)
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" -H "Content-Type: application/json" \
  -d '{"command":"ZREVRANGEBYSCORE honeypot:log +inf -inf LIMIT 0 10"}' \
  "$LPO_BASE/admin/valkey"

9. Quick Reference — Where Do I Look?

I Want To See...Go To
What an ECS container printedCloudWatch: /aws/ecs/trinity-beast (filter by stream prefix)
Why a Lambda failedCloudWatch: /aws/lambda/<function-name> (filter for ERROR)
Every HTTP request to the APIS3: aws-waf-logs-trinity-beast/AWSLogs/.../WAFLogs/
Which WAF rule blocked whatS3: WAF logs → filter for action: "BLOCK"
Per-request latency to the APIS3: aws-waf-logs-trinity-beast/alb/
Every website visitor requestS3: aws-waf-logs-trinity-beast/cloudfront/
Who accessed S3 objectsS3: aws-waf-logs-trinity-beast/s3-access/
Network traffic patternsCloudWatch: /aws/vpc/trinity-beast-flowlogs
Slow database queriesCloudWatch: /aws/rds/cluster/.../postgresql
Slow Valkey commandsCloudWatch: /aws/elasticache/.../slow-log
Who called what AWS APICloudWatch: /aws/cloudtrail/trinity-beast
Historical AWS API callsS3: aws-cloudtrail-logs-211998422884-879cb71c
What AutoOps didValkey: autoops:actions:log
Honeypot hitsValkey: honeypot:log
Translation job traceAurora: translation_job_events table
pg_cron job resultsAurora: cron.job_run_details table
Session history & decisionsS3: daily-reports/tbi-ops-* or ~/daily-reports/

10. Cost Analysis

Total Monthly Cost
< $0.50
Daily Volume
~78 MB
Annual Storage
~28 GB
S3 Delivery Cost
$0
SourceDestinationDaily VolumeMonthly Cost
WAF full request logsS3~50 MB~$0.04
ALB access logsS3~5 MB< $0.01
CloudFront logsS3~15 MB~$0.01
S3 access logsS3~2 MB< $0.01
Aurora PostgreSQL logsCloudWatch~5 MB~$0.08
Valkey slow/engine logsCloudWatch~1 MB~$0.02
Total< $0.50/month
Why so cheap? S3 log delivery from AWS services (WAF, ALB, CloudFront, S3) has zero delivery cost — you only pay for storage. At ~72 MB/day to S3, annual storage is ~26 GB × $0.023/GB = $0.60/year for the first 90 days, then Glacier IR at $0.004/GB for months 3-12. CloudWatch ingestion ($0.50/GB) is the only real cost driver, and we only send low-volume logs there (Aurora + Valkey = ~6 MB/day = ~$0.10/month).

11. Athena — SQL Analytics Across All Logs

Amazon Athena is a serverless SQL engine that queries data directly in S3 — no database to provision, no data to move, no ETL. Point it at the log files, define a schema, and run standard SQL. Deployed June 4, 2026.

Workgroup
trinity-beast-analytics
Database
tbi_logs
Tables
5
Saved Queries
9
Cost per Query
~$0.01
Query Results
s3://.../athena-results/

11.1 Tables

TableSourcePartitioned ByKey Fields
waf_logsWAF full request logsyear / month / day / houraction, httprequest.clientip, httprequest.uri, httprequest.country, terminatingruleid, labels
alb_logsALB access logsyear / month / dayrequest_url, elb_status_code, target_processing_time, client_ip, user_agent
cloudfront_logsCloudFront standard logsdate columncs_uri_stem, c_ip, sc_status, x_edge_location, x_edge_result_type, time_taken
s3_access_logsS3 server access logsoperation, key, remote_ip, http_status, requester
cloudtrail_logsCloudTrail API eventsaccount / region / year / month / dayeventname, eventsource, sourceipaddress, useridentity, errorcode
Partition projection — all date-partitioned tables use Athena partition projection. This means new data is queryable the instant it arrives in S3. No manual MSCK REPAIR TABLE, no Glue crawlers, no partition maintenance. The schema tells Athena where to look based on the date path pattern.

11.2 Saved Queries (Ready to Run)

Nine pre-built queries are saved in the trinity-beast-analytics workgroup. Run them from the Athena console or CLI — just click "Saved queries" in the console.

Query NameWhat It Answers
WAF: Blocks by Country (24h)Top countries generating blocked requests today, grouped by WAF rule
WAF: Top Blocked IPs (24h)IPs with the most blocks — identify repeat attackers and which rules caught them
WAF: Request Volume by Endpoint (24h)Traffic distribution across API endpoints — which paths get the most hits
WAF: Trace IP (forensics)Every request from a specific IP: time, action, URI, method, country. Replace THE_IP with the target
CloudTrail: API Calls by Service (24h)Top AWS API calls grouped by service — spot unusual activity patterns
CloudTrail: Error Events (24h)All API calls that returned errors today — permission denials, throttles, service errors
CloudFront: Top Requested Pages (24h)Most popular website pages by request count, bytes served, and average load time
CloudFront: Traffic by Country (24h)Website visitors by country using viewer IP geolocation — audience geography and traffic patterns
Cross-Source: IP CorrelationFind IPs that hit BOTH the API (WAF) and website (CloudFront) — reconnaissance detection via JOIN

11.3 Example Queries

Security — Who got blocked and why?

-- Top 30 blocked IPs with the rules that caught them
SELECT httprequest.clientip AS ip,
       httprequest.country AS country,
       count(*) AS blocks,
       array_agg(DISTINCT terminatingruleid) AS rules
FROM tbi_logs.waf_logs
WHERE action = 'BLOCK'
  AND year = '2026' AND month = '06' AND day = '04'
GROUP BY httprequest.clientip, httprequest.country
ORDER BY blocks DESC
LIMIT 30;

Performance — P95 latency by endpoint

-- API endpoint latency percentiles from ALB logs
SELECT request_url,
       count(*) AS requests,
       approx_percentile(target_processing_time, 0.5) AS p50,
       approx_percentile(target_processing_time, 0.95) AS p95,
       approx_percentile(target_processing_time, 0.99) AS p99
FROM tbi_logs.alb_logs
WHERE year = '2026' AND month = '06' AND day = '04'
  AND request_url LIKE '%/price%'
GROUP BY request_url
ORDER BY requests DESC;

Cross-Source — Reconnaissance detection

-- IPs that hit both the API AND the website (probing behavior)
SELECT w.ip, w.waf_requests, w.blocks, cf.cf_requests
FROM (
  SELECT httprequest.clientip AS ip,
         count(*) AS waf_requests,
         sum(CASE WHEN action='BLOCK' THEN 1 ELSE 0 END) AS blocks
  FROM tbi_logs.waf_logs
  WHERE year = '2026' AND month = '06' AND day = '04'
  GROUP BY httprequest.clientip
) w
JOIN (
  SELECT c_ip AS ip, count(*) AS cf_requests
  FROM tbi_logs.cloudfront_logs
  WHERE date = DATE '2026-06-04'
  GROUP BY c_ip
) cf ON w.ip = cf.ip
ORDER BY w.blocks DESC, w.waf_requests DESC
LIMIT 20;

Forensics — Full trace of a suspicious IP

-- Everything a specific IP did across all API endpoints today
SELECT from_unixtime(timestamp/1000) AS time,
       action,
       httprequest.uri,
       httprequest.httpmethod,
       terminatingruleid,
       httprequest.country
FROM tbi_logs.waf_logs
WHERE httprequest.clientip = '45.148.10.51'
  AND year = '2026' AND month = '06'
ORDER BY timestamp DESC
LIMIT 100;

Infrastructure — What AWS APIs are being called?

-- Top AWS API calls in our account today (spot automation vs manual)
SELECT eventsource, eventname, sourceipaddress, count(*) AS calls
FROM tbi_logs.cloudtrail_logs
WHERE account = '211998422884' AND region = 'us-east-2'
  AND year = '2026' AND month = '06' AND day = '04'
GROUP BY eventsource, eventname, sourceipaddress
ORDER BY calls DESC
LIMIT 30;

11.4 How to Use

From the AWS Console

Open Athena → select workgroup trinity-beast-analytics → database tbi_logs → click "Saved queries" for the 9 pre-built queries, or write your own SQL in the editor. Results appear in seconds.

From the CLI

Submit a query and fetch results:

# Run a query
QID=$(aws athena start-query-execution \
  --query-string "SELECT action, count(*) FROM tbi_logs.waf_logs WHERE year='2026' AND month='06' AND day='04' GROUP BY action" \
  --work-group trinity-beast-analytics \
  --region us-east-2 --query 'QueryExecutionId' --output text)

# Wait for completion (typically 1-3 seconds)
aws athena get-query-execution --query-execution-id $QID \
  --region us-east-2 --query 'QueryExecution.Status.State' --output text

# Get results
aws athena get-query-results --query-execution-id $QID \
  --region us-east-2 --output json | jq '.ResultSet.Rows[] | .Data | map(.VarCharValue) | join(" | ")'

11.5 Cost & Limits

DimensionValue
Price per query$5 per TB scanned (minimum 10 MB charge per query)
Current daily log volume~78 MB/day → ~2.4 GB/month
Cost of scanning full month~$0.012
Query cost limit (workgroup)1 GB per query (safety guardrail)
Partition benefitQuerying one day scans only that day's folder — not the entire bucket
Compression benefitGzipped files = less bytes scanned = lower cost per query
Cost optimization tip: Always include partition filters (year, month, day) in your WHERE clause. Without them, Athena scans ALL data in the table. With them, it only reads the specific folders you need — reducing cost and execution time by orders of magnitude.

11.6 Pattern Recognition — What Athena Reveals

The real power isn't in individual queries — it's in combining signals across log sources to surface patterns invisible to any single source. These are the questions Athena answers that nothing else in the stack can:

PatternHow to DetectSources
Coordinated reconnaissanceSame IP appears in CloudFront (scouting docs), WAF (probing API), and gets blocked. The cross-source correlation query surfaces these instantly.WAF + CloudFront JOIN
Latency regressionALB logs contain per-request target_processing_time. Compute P95/P99 by endpoint across days to spot creeping degradation before users notice.ALB logs, GROUP BY day
Geographic traffic shiftsCloudFront logs include edge location + viewer country. A sudden spike from an unusual region could signal DDoS ramp-up, bot farm, or a new market discovering the product.CloudFront logs
Rate limit evasionWAF logs show high-volume IPs that stay just below the 2000/5min threshold. They're not blocked but they're up to something — distributed scraping across multiple keys.WAF logs, windowed aggregation
Incident timeline reconstruction"What happened between 2:15 and 2:45 AM?" — a single query across WAF + CloudTrail + ALB shows: who called what, from where, what succeeded, what failed, in chronological order.WAF + CloudTrail + ALB
Bot fingerprintingWAF logs include JA3 TLS fingerprints. Group blocked requests by fingerprint to identify bot frameworks vs. legitimate browsers — the same fingerprint across 50 IPs is a botnet.WAF logs, JA3 field
Infrastructure drift detectionCloudTrail error events grouped by service — sudden permission denials or throttling reveals IAM policy drift, resource limits, or configuration changes you didn't make.CloudTrail errors
Content popularity evolutionCloudFront logs show which docs/pages are hot, which are dead, and how the distribution shifts over time. Informs what to translate first, what to promote, where to invest.CloudFront logs, daily aggregation
The multiplier effect: Each table alone is useful. JOINing them is where the real intelligence lives. An IP that appears in only one source is noise. An IP that appears in WAF blocks, CloudFront requests, AND CloudTrail (trying to call AWS APIs from our VPC's NAT gateway IP) is a story — and Athena tells it in one query.

11.7 Future Enhancements