The Trinity Beast Infrastructure — Unified Logs & Observability

1. The Problem — Why This Was Needed

Before June 4, 2026, the Trinity Beast Infrastructure had a logging blind spot. Application code logged to CloudWatch — that part worked. But the surrounding infrastructure was largely invisible:

❌ Before — Scattered & Dark

ALB access logs: disabled — no record of individual HTTP requests to the API
CloudFront access logs: disabled — no record of website visitor requests
WAF request logs: disabled — only sampled CloudWatch metrics (block counts per rule, no request detail)
S3 access logs: disabled — no audit of who accessed deployed files
Aurora server logs: not exported — slow queries only visible via RDS console download
Valkey slow/engine logs: not exported — no visibility into cache performance issues
CloudWatch retention: infinite on 18 log groups — unbounded cost growth
No single place to look — context switching between console, S3, CloudWatch, API calls

✅ After — Unified & Complete

ALB logs: every API request → S3 (latency, status, target, client IP)
CloudFront logs: every website hit → S3 (edge location, cache status, bytes)
WAF logs: every API request → S3 (full headers, rule match, action, country)
S3 access logs: every object operation → S3 (requester, operation, key)
Aurora logs: slow queries + errors → CloudWatch (instantly searchable)
Valkey logs: slow commands + engine events → CloudWatch
All 30 log groups: explicit 90-day retention (controlled cost)
One S3 bucket, one lifecycle policy, one search path

The core insight: You cannot learn from a system you cannot observe. Every HTTP request, every database query, every cache miss, every WAF decision — these are the signals that reveal performance bottlenecks, security threats, usage patterns, and optimization opportunities. Without centralized logging, these signals evaporate the moment they occur. With it, they become searchable institutional memory.

What We Couldn't Do Before

Forensics: "Which requests hit this IP in the last 24 hours?" — required guessing from WAF sample metrics
Performance: "What's the P99 latency for /price endpoints?" — no per-request latency data
Security correlation: "Did this blocked IP also hit CloudFront?" — separate systems, no join path
Capacity planning: "What's our cache miss rate for which commands?" — Valkey slow-log was invisible
Incident review: "What queries were slow during that spike?" — Aurora logs not in CloudWatch, manual RDS download required
Visitor analytics: "Which docs are most popular by edge location?" — CloudFront data was dark

What We Can Do Now

Full request tracing: client → CloudFront → ALB → ECS container (correlated by request ID)
Security forensics across all layers in one S3 bucket (WAF + ALB + CloudFront + S3 access)
Slow query alerting directly from CloudWatch (no more manual downloads)
Cache performance monitoring (Valkey slow-log + engine events)
Athena SQL queries across all S3 log sources (WAF + ALB + CloudFront together)
365-day archive with Glacier transition — incident review up to a year later

2. Architecture — The Unified Logging Model

The logging architecture splits into three tiers based on query pattern and retention needs:

Diagram 2.1 — Unified Logging Architecture — Three-Tier Model

flowchart TB
    %% Sources
    subgraph sources["Log Sources"]
        direction LR
        ALB["ALB
Trinity-Beast-TCP-ALB"]
        CF["CloudFront
E110PRKEIYQVLL"]
        WAF["WAF
trinity-beast-api-waf"]
        S3SRC["S3 Website
trinity-beast-website-east2"]
        ECS["ECS Containers
Main · Mirror · LRS · Webhook"]
        LAMBDA["Lambda Functions
14 functions"]
        AURORA["Aurora PostgreSQL
Writer + Reader"]
        VALKEY["Valkey 7.2
cache.r7g.2xlarge"]
        VPC["VPC Flow Logs
2 VPCs"]
        CT["CloudTrail
Multi-region"]
    end

    %% Tier 1 - S3
    subgraph tier1["TIER 1 — S3: Long-Term Archive + Bulk Query"]
        direction TB
        BUCKET["aws-waf-logs-trinity-beast"]
        WAF_PREFIX["AWSLogs/.../WAFLogs/
JSON · gzipped · every 5 min"]
        ALB_PREFIX["alb/
space-delimited · gzipped · every 5 min"]
        CF_PREFIX["cloudfront/
tab-delimited · gzipped · every few min"]
        S3_PREFIX["s3-access/
space-delimited · best effort"]
        LIFECYCLE["Lifecycle: Standard 0-90d → Glacier IR 90-365d → Delete"]
    end

    %% Tier 2 - CloudWatch
    subgraph tier2["TIER 2 — CloudWatch: Real-Time + Instant Search"]
        direction TB
        CW_ECS["/aws/ecs/trinity-beast
4 containers · 1 group · UME self-identifies"]
        CW_LAMBDA["/aws/lambda/*
14 function log groups"]
        CW_AURORA["/aws/rds/cluster/.../postgresql
Slow queries · Lock waits · Errors"]
        CW_VALKEY["/aws/elasticache/.../slow-log + engine-log
Slow commands · Engine events"]
        CW_VPC["/aws/vpc/trinity-beast-flowlogs
Both VPCs · Accept/Reject"]
        CW_CT["/aws/cloudtrail/trinity-beast
30-day window · S3 has full archive"]
        CW_INSIGHTS["Container Insights + RDSOSMetrics"]
        RETENTION["All groups: 90-day retention"]
    end

    %% Tier 3 - Application State
    subgraph tier3["TIER 3 — Valkey + Aurora: Application-Level State"]
        direction TB
        VK_OPS["autoops:actions:log · threats:daily
Autonomous actions · Threat summaries"]
        VK_HONEY["honeypot:log · honeypot:ip:*
Trap hits · Per-IP forensics"]
        VK_TX["tx:job:* · tx:history
Translation job progress"]
        VK_REPORT["report:text:* · docs:session:log
Daily reports · Session history"]
        AU_EVENTS["translation_job_events
usage_logs · support_tickets"]
        AU_CRON["cron.job_run_details
pg_cron execution history"]
    end

    %% Connections - Tier 1
    WAF -->|"full request logs"| WAF_PREFIX
    ALB -->|"access logs"| ALB_PREFIX
    CF -->|"standard logs"| CF_PREFIX
    S3SRC -->|"server access"| S3_PREFIX
    WAF_PREFIX --> BUCKET
    ALB_PREFIX --> BUCKET
    CF_PREFIX --> BUCKET
    S3_PREFIX --> BUCKET
    BUCKET --> LIFECYCLE

    %% Connections - Tier 2
    ECS -->|"stdout/stderr"| CW_ECS
    LAMBDA -->|"execution logs"| CW_LAMBDA
    AURORA -->|"pg logs export"| CW_AURORA
    VALKEY -->|"slow + engine"| CW_VALKEY
    VPC -->|"flow logs"| CW_VPC
    CT -->|"API events"| CW_CT
    ECS -->|"metrics"| CW_INSIGHTS
    AURORA -->|"OS metrics"| CW_INSIGHTS

    %% Connections - Tier 3
    ECS -->|"app writes"| VK_OPS
    ECS -->|"honeypot hits"| VK_HONEY
    ECS -->|"job state"| VK_TX
    ECS -->|"reports"| VK_REPORT
    ECS -->|"audit rows"| AU_EVENTS
    AURORA -->|"cron results"| AU_CRON

    %% Also CloudTrail to S3
    CT -->|"archive"| BUCKET

    %% Styling
    classDef s3Style fill:#1a365d,stroke:#60a5fa,color:#e2e8f0
    classDef cwStyle fill:#14532d,stroke:#10b981,color:#e2e8f0
    classDef appStyle fill:#4c1d95,stroke:#a78bfa,color:#e2e8f0
    classDef sourceStyle fill:#1e293b,stroke:#FF9900,color:#e2e8f0
    classDef bucketStyle fill:#0f172a,stroke:#FF9900,color:#FF9900

    class WAF_PREFIX,ALB_PREFIX,CF_PREFIX,S3_PREFIX,LIFECYCLE s3Style
    class CW_ECS,CW_LAMBDA,CW_AURORA,CW_VALKEY,CW_VPC,CW_CT,CW_INSIGHTS,RETENTION cwStyle
    class VK_OPS,VK_HONEY,VK_TX,VK_REPORT,AU_EVENTS,AU_CRON appStyle
    class ALB,CF,WAF,S3SRC,ECS,LAMBDA,AURORA,VALKEY,VPC,CT sourceStyle
    class BUCKET bucketStyle

Request Lifecycle — What Gets Logged Where

A single API request generates log entries across multiple tiers. This diagram traces a request from edge to database, showing where each log event lands:

Diagram 2.2 — Request Lifecycle — Log Entry Points Across Tiers

sequenceDiagram
    participant Client as Client (Browser/SDK)
    participant CF as CloudFront
    participant WAF as WAF
    participant ALB as ALB
    participant ECS as ECS Container
    participant Aurora as Aurora
    participant Valkey as Valkey
    participant S3Log as S3 Logs Bucket

    Note over Client,S3Log: A single GET /price?asset=BTC request

    Client->>CF: HTTPS request
    Note right of CF: 📝 CloudFront log → S3
cloudfront/ prefix
(edge, IP, URI, cache hit/miss)

    CF->>WAF: Forward to origin
    Note right of WAF: 📝 WAF log → S3
AWSLogs/.../WAFLogs/
(IP, headers, rule match, ALLOW)

    WAF->>ALB: Passed rules
    Note right of ALB: 📝 ALB access log → S3
alb/ prefix
(latency, status, target IP)

    ALB->>ECS: Route to healthy target
    Note right of ECS: 📝 Container log → CloudWatch
/aws/ecs/trinity-beast
(UME: cluster_node, endpoint, api_key_id)

    ECS->>Valkey: Check price cache
    Note right of Valkey: (no log unless slow command)

    ECS->>Aurora: Query if cache miss
    Note right of Aurora: 📝 If >1s → PostgreSQL log
/aws/rds/.../postgresql

    ECS-->>Client: UME Response (200)

    Note over Client,S3Log: Result: 4-6 log entries for ONE request across 3-4 destinations

Design Decisions

Why one S3 bucket for all log types?

Single lifecycle policy. Single IAM scope. Single place to point Athena. One less thing to remember. The bucket is named aws-waf-logs-trinity-beast because WAF enforces a naming convention (aws-waf-logs-* prefix required). Rather than fight it, we made the WAF-compliant name the home for everything.

Why WAF logs on S3 instead of CloudWatch?

Cost. CloudWatch Logs ingestion is $0.50/GB. S3 Standard is $0.023/GB/month for storage + $0 for delivery. At ~50 MB/day of WAF logs, S3 saves ~$0.70/month vs CloudWatch. More importantly, S3 logs are Athena-queryable — we can run SQL across millions of WAF events without building a pipeline.

Why container logs don't need separate groups

All four LPO containers (Main, Mirror, LRS, Webhook) run the same binary and log the same way. Every structured log line carries UME fields: cluster_node (which container), agent_profile_arn (which actor), endpoint (what path), ip_address (who called). There is zero ambiguity about origin. Keeping them in one log group means one search covers the entire cluster — no need to check each node separately when hunting for an error or tracing a request.

Why keep CloudWatch for application logs?

Instant search. When a Lambda errors or a container crashes, you need the answer in seconds — not minutes downloading from S3. CloudWatch's filter-log-events gives sub-second search across application logs. The tradeoff is worth it: application logs are low-volume and high-urgency; infrastructure logs are high-volume and low-urgency.

3. S3 Unified Logs Bucket

Bucket Name

aws-waf-logs-trinity-beast

Region

us-east-2

Lifecycle

365d

Encryption

SSE-S3

ACL Mode

BucketOwnerPreferred

Log Sources

3.1 Prefix Structure

Prefix	Source	Format	Delivery
`AWSLogs/211998422884/WAFLogs/us-east-2/trinity-beast-api-waf/`	WAF (ALB)	JSON (gzipped), one event per line	Every 5 minutes
`alb/AWSLogs/211998422884/elasticloadbalancing/us-east-2/`	ALB	Space-delimited (gzipped)	Every 5 minutes
`cloudfront/`	CloudFront	Tab-delimited (gzipped), W3C extended format	Every few minutes
`s3-access/`	S3 Server Access	Space-delimited (not compressed)	Best-effort (minutes)

3.2 What Each Log Contains

WAF Logs (richest security signal)

Timestamp, action (ALLOW/BLOCK/COUNT), terminating rule ID
Full HTTP request: client IP, URI, method, protocol, all headers
Country of origin, HTTP version
Rule group evaluations (which rules matched, which terminated)
Labels applied by rules (for custom logic)

ALB Access Logs

Request timestamp, ELB name, client IP:port, target IP:port
Request processing time, target processing time, response processing time
HTTP status code (ELB and target), bytes received/sent
Full request line (method, URI, protocol)
User-Agent, SSL cipher, SSL protocol, target group ARN
Trace ID (for X-Ray correlation)

CloudFront Standard Logs

Date, time, edge location (POP code), bytes served
Client IP, HTTP method, host, URI stem, status code
Referer, User-Agent, URI query string
Result type (Hit/Miss/Error), request ID
Protocol (HTTP/HTTPS), bytes from client, time-taken, time-to-first-byte

S3 Server Access Logs

Bucket owner, bucket name, timestamp, remote IP
Requester (IAM principal), request ID, operation (GET/PUT/LIST)
Key (object path), request-URI, HTTP status
Error code, bytes sent, object size, total time, turn-around time
Referer, User-Agent, version ID

3.3 Bucket Policy

The bucket policy grants write access to four AWS service principals:

Principal	Access	Path
`delivery.logs.amazonaws.com`	PutObject	`AWSLogs/211998422884/*` (WAF)
`arn:aws:iam::033677994240:root`	PutObject	`alb/*` (ALB, us-east-2 ELB account)
`delivery.logs.amazonaws.com`	PutObject	`alb/`, `cloudfront/`
`logging.s3.amazonaws.com`	PutObject	`s3-access/*`

3.4 Lifecycle Rules

Age	Storage Class	Cost/GB/Month	Access Speed
0–90 days	S3 Standard	$0.023	Instant (milliseconds)
90–365 days	Glacier Instant Retrieval	$0.004	Instant (milliseconds)
>365 days	Deleted	—	—

4. CloudWatch Log Groups (Application Logs)

All application-level logs live in CloudWatch for instant search. Every group has explicit 90-day retention — no infinite growth.

4.1 ECS Container Logs

Already unified. All four LPO/LRS/Webhook containers share a single log group (/aws/ecs/trinity-beast). There is no need to separate them — every log line self-identifies via the Unified Messaging Envelope (UME): cluster_node tells you which container (BeastMain, BeastMirror, BeastLRS, BeastWebhook), agent_profile_arn tells you which actor produced it, and endpoint tells you what was being served. Stream prefixes exist for AWS's benefit (task routing), but for searching, you filter on UME fields — not stream names.

Service	Log Group	Stream Prefix	UME `cluster_node`
BeastMain	`/aws/ecs/trinity-beast`	`main/`	`BeastMain`
BeastMirror	`/aws/ecs/trinity-beast`	`mirror/`	`BeastMirror`
BeastLRS	`/aws/ecs/trinity-beast`	`lrs/`	`BeastLRS`
BeastWebhook	`/aws/ecs/trinity-beast`	`webhook/`	`BeastWebhook`
BeastTranslate	`/ecs/tbi-translate-worker`	`service/tbi-translate-worker/`	—
BeastReconciler	`/aws/ecs/trinity-beast-sync`	`sync/trinity-beast-sync-job/`	—

The translate worker and sync job have separate log groups because they are different binaries (Python and Go respectively) with different lifecycles — they are not LPO containers. But the four LPO/LRS nodes are the same binary running in parallel, differentiated only by SERVER_TYPE and CLUSTER_NODE environment variables. UME makes separation unnecessary — search across all four at once and filter by the fields that matter.

Diagram 4.1 — Container Log Unification via UME Self-Identification

flowchart LR
    subgraph containers["4 Containers — Same Binary, Same Log Group"]
        M["BeastMain
APP_REPORT_SERVER"]
        R["BeastMirror
APP_REPORT_SERVER"]
        L["BeastLRS
APP_REPORT_SERVER"]
        W["BeastWebhook
WEBHOOK_SERVER"]
    end

    subgraph loggroup["/aws/ecs/trinity-beast"]
        LOG["Unified Stream
All 4 containers interleaved"]
    end

    subgraph ume["UME Fields — Self-Identification"]
        CN["cluster_node:
BeastMain | BeastMirror | BeastLRS | BeastWebhook"]
        AP["agent_profile_arn:
tbi | webhook-engine | rhema | ..."]
        EP["endpoint:
/price | /reports | /admin/..."]
        IP["ip_address:
client source IP"]
    end

    M --> LOG
    R --> LOG
    L --> LOG
    W --> LOG
    LOG --> CN
    LOG --> AP
    LOG --> EP
    LOG --> IP

    classDef containerStyle fill:#1e293b,stroke:#FF9900,color:#e2e8f0
    classDef logStyle fill:#14532d,stroke:#10b981,color:#e2e8f0
    classDef umeStyle fill:#1e1b4b,stroke:#a78bfa,color:#e2e8f0

    class M,R,L,W containerStyle
    class LOG logStyle
    class CN,AP,EP,IP umeStyle

# Search ALL containers at once — UME self-reports which node answered
aws logs filter-log-events --log-group-name "/aws/ecs/trinity-beast" \
  --start-time $(date -v-1H +%s)000 --filter-pattern "BeastMain" \
  --region us-east-2 --query 'events[*].message' --output json

# Find errors across all 4 nodes simultaneously
aws logs filter-log-events --log-group-name "/aws/ecs/trinity-beast" \
  --start-time $(date -v-1H +%s)000 --filter-pattern "ERROR" \
  --region us-east-2 --query 'events[*].message' --output json

# Filter by agent actor (e.g., only webhook-engine responses)
aws logs filter-log-events --log-group-name "/aws/ecs/trinity-beast" \
  --start-time $(date -v-1H +%s)000 --filter-pattern "webhook-engine" \
  --region us-east-2 --query 'events[*].message' --output json

4.2 Lambda Function Logs

Function	Log Group	Purpose
`trinity-beast-receipt`	`/aws/lambda/trinity-beast-receipt`	Stripe receipt processing
`trinity-beast-queued-writer`	`/aws/lambda/trinity-beast-queued-writer`	SQS → Aurora batch inserts
`tbi-ops-notify`	`/aws/lambda/tbi-ops-notify`	Formatted SES notifications
`tbi-ops-self-heal`	`/aws/lambda/tbi-ops-self-heal`	ECS task restart automation
`tbi-ops-waf-action`	`/aws/lambda/tbi-ops-waf-action`	WAF rule management
`tbi-ops-honeypot-processor`	`/aws/lambda/tbi-ops-honeypot-processor`	Honeypot queue → WAF blocks
`tbi-ops-bedrock-analyze`	`/aws/lambda/tbi-ops-bedrock-analyze`	AI threat correlation
`tbi-rhema-support`	`/aws/lambda/tbi-rhema-support`	AI support assistant
`tbi-ops-digest`	`/aws/lambda/tbi-ops-digest`	Daily/weekly operational digest
`tbi-translate-deploy`	`/aws/lambda/tbi-translate-deploy`	CloudFront invalidation
`tbi-translate-finalize`	`/aws/lambda/tbi-translate-finalize`	Search rebuild + notification
`tbi-translate-batch-prepare`	`/aws/lambda/tbi-translate-batch-prepare`	Batch JSONL preparation
`tbi-translate-batch-submit`	`/aws/lambda/tbi-translate-batch-submit`	Bedrock batch job submission

4.3 Infrastructure Logs

Log Group	Content	Retention
`/aws/cloudtrail/trinity-beast`	All AWS API calls (multi-region)	30 days (S3 archive is indefinite)
`/aws/vpc/trinity-beast-flowlogs`	Network traffic for both VPCs	90 days
`/aws/ecs/containerinsights/.../performance`	ECS CPU, memory, network per task	90 days
`RDSOSMetrics`	Aurora OS-level metrics (every 30s)	90 days
`/aws/rds/cluster/trinity-beast-aurora-cluster/postgresql`	Slow queries (>1s), lock waits, errors	90 days
`/aws/elasticache/trinity-beast-cache/slow-log`	Valkey commands exceeding threshold	90 days
`/aws/elasticache/trinity-beast-cache/engine-log`	Valkey engine events (startup, failover)	90 days

5. Valkey Operational Logs

Beyond CloudWatch export, the application writes structured operational data directly to Valkey as keys. These function as real-time logs for the AutoOps system.

Key Pattern	Type	What It Records	TTL
`autoops:actions:log`	Sorted Set	Every autonomous action (self-heals, WAF blocks, notifications)	Permanent
`autoops:threats:daily`	String (JSON)	Today's AI-generated threat summary	Overwritten daily
`honeypot:log`	Sorted Set	Chronological log of all honeypot trap hits	Permanent
`honeypot:ip:<ip>`	Hash	Per-IP: first_seen, last_seen, hit_count, paths	Permanent
`tx:job:<id>`	Hash	Translation job state (live progress per pair)	7 days
`report:text:YYYY-MM-DD`	String	Plain-text daily report for newsletter	30 days
`docs:session:log`	List	Session close entries (date, session, summary)	Permanent

6. Aurora Database Logs

6.1 PostgreSQL Server Logs (CloudWatch)

Enabled June 4, 2026. Aurora exports slow queries and errors to CloudWatch automatically. The instance parameter group controls what gets logged:

Parameter	Value	Effect
`log_min_duration_statement`	1000	Log queries taking longer than 1 second
`log_temp_files`	0	Log all temp file usage (sorts spilling to disk)
`log_lock_waits`	1	Log lock wait events
`track_io_timing`	on	Include I/O timing in EXPLAIN output
`idle_in_transaction_session_timeout`	300000	Kill idle-in-transaction sessions after 5 min

6.2 pg_cron Execution History

Scheduled job execution history lives in Aurora itself — not CloudWatch. Query via:

SELECT r.jobid, j.jobname, r.start_time, r.end_time, r.status, r.return_message
FROM cron.job_run_details r JOIN cron.job j ON r.jobid = j.jobid
ORDER BY r.start_time DESC LIMIT 20;

6.3 Application Audit Tables

Table	Purpose	Retention
`translation_job_events`	Every state transition for every translation job	Permanent
`usage_logs`	Every API call (raw request log)	93 days in Valkey, permanent in Aurora
`support_tickets`	Support ticket history	1095 days (3 years)
`cron.job_run_details`	pg_cron execution results	~7 days (pg_cron internal cleanup)

7. Retention & Lifecycle Policy

The rule: 90 days for CloudWatch (multiples-of-3 convention: 3 × 30). 365 days for S3 with Glacier transition at 90 days. No infinite-retention log groups exist. Every byte has a known expiration.

Tier	Retention	Storage	Applies To
CloudWatch (Application)	90 days	CloudWatch Logs	All 18 Lambda + 4 ECS groups
CloudWatch (Infrastructure)	90 days	CloudWatch Logs	VPC Flow Logs, Container Insights, RDSOSMetrics, Aurora PG, Valkey
CloudWatch (CloudTrail)	30 days	CloudWatch Logs	API audit (real-time search); S3 archive handles long-term
S3 (Standard)	0–90 days	S3 Standard	WAF, ALB, CloudFront, S3 access logs
S3 (Glacier IR)	90–365 days	Glacier Instant Retrieval	Same — automatic transition
S3 (CloudTrail)	Indefinite	S3 Standard	CloudTrail raw archive (no lifecycle rule)
Valkey	7–30 days	In-memory	Operational state (TTL-managed)
Aurora	Permanent	Aurora storage	Usage logs, tickets, job events

8. Querying & Searching Logs

8.1 CloudWatch — Instant Search

# Find errors in any Lambda (last hour)
aws logs filter-log-events --log-group-name "/aws/lambda/tbi-ops-notify" \
  --start-time $(date -v-1H +%s)000 --filter-pattern "ERROR" \
  --region us-east-2 --query 'events[*].message' --output json

# Slow Aurora queries (last 24h)
aws logs filter-log-events \
  --log-group-name "/aws/rds/cluster/trinity-beast-aurora-cluster/postgresql" \
  --start-time $(date -v-24H +%s)000 --filter-pattern "duration" \
  --region us-east-2 --query 'events[*].message' --output json

# ECS container output for sync job (latest run)
STREAM=$(aws logs describe-log-streams --log-group-name "/aws/ecs/trinity-beast-sync" \
  --order-by LastEventTime --descending --limit 1 \
  --region us-east-2 --query 'logStreams[0].logStreamName' --output text)
aws logs get-log-events --log-group-name "/aws/ecs/trinity-beast-sync" \
  --log-stream-name "$STREAM" --limit 30 \
  --region us-east-2 --query 'events[*].message' --output json

8.2 S3 — Download & Analyze

# WAF: Get recent blocked requests
latest=$(aws s3 ls "s3://aws-waf-logs-trinity-beast/AWSLogs/211998422884/WAFLogs/\
us-east-2/trinity-beast-api-waf/$(date -u +%Y/%m/%d/%H)/" --region us-east-2 | tail -1 | awk '{print $4}')
aws s3 cp "s3://aws-waf-logs-trinity-beast/AWSLogs/211998422884/WAFLogs/\
us-east-2/trinity-beast-api-waf/$(date -u +%Y/%m/%d/%H)/$latest" - \
  --region us-east-2 | gunzip | jq -c 'select(.action=="BLOCK") | {ip: .httpRequest.clientIp, uri: .httpRequest.uri, rule: .terminatingRuleId}'

# ALB: Check latency for a specific endpoint
aws s3 cp "s3://aws-waf-logs-trinity-beast/alb/AWSLogs/211998422884/\
elasticloadbalancing/us-east-2/$(date +%Y/%m/%d)/<file>.log.gz" - \
  --region us-east-2 | gunzip | awk '{print $6, $13}' | grep "/price"

# CloudFront: Top requested URIs
aws s3 cp "s3://aws-waf-logs-trinity-beast/cloudfront/<file>.gz" - \
  --region us-east-2 | gunzip | awk '{print $8}' | sort | uniq -c | sort -rn | head -20

8.3 Valkey — Operational State

# Recent autonomous actions
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" -H "Content-Type: application/json" \
  -d '{"command":"ZREVRANGEBYSCORE autoops:actions:log +inf -inf LIMIT 0 10"}' \
  "$LPO_BASE/admin/valkey"

# Today's threat assessment
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" -H "Content-Type: application/json" \
  -d '{"command":"GET autoops:threats:daily"}' "$LPO_BASE/admin/valkey"

# Honeypot activity (last 10 hits)
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" -H "Content-Type: application/json" \
  -d '{"command":"ZREVRANGEBYSCORE honeypot:log +inf -inf LIMIT 0 10"}' \
  "$LPO_BASE/admin/valkey"

9. Quick Reference — Where Do I Look?

I Want To See...	Go To
What an ECS container printed	CloudWatch: `/aws/ecs/trinity-beast` (filter by stream prefix)
Why a Lambda failed	CloudWatch: `/aws/lambda/<function-name>` (filter for ERROR)
Every HTTP request to the API	S3: `aws-waf-logs-trinity-beast/AWSLogs/.../WAFLogs/`
Which WAF rule blocked what	S3: WAF logs → filter for `action: "BLOCK"`
Per-request latency to the API	S3: `aws-waf-logs-trinity-beast/alb/`
Every website visitor request	S3: `aws-waf-logs-trinity-beast/cloudfront/`
Who accessed S3 objects	S3: `aws-waf-logs-trinity-beast/s3-access/`
Network traffic patterns	CloudWatch: `/aws/vpc/trinity-beast-flowlogs`
Slow database queries	CloudWatch: `/aws/rds/cluster/.../postgresql`
Slow Valkey commands	CloudWatch: `/aws/elasticache/.../slow-log`
Who called what AWS API	CloudWatch: `/aws/cloudtrail/trinity-beast`
Historical AWS API calls	S3: `aws-cloudtrail-logs-211998422884-879cb71c`
What AutoOps did	Valkey: `autoops:actions:log`
Honeypot hits	Valkey: `honeypot:log`
Translation job trace	Aurora: `translation_job_events` table
pg_cron job results	Aurora: `cron.job_run_details` table
Session history & decisions	S3: `daily-reports/tbi-ops-*` or `~/daily-reports/`

10. Cost Analysis

Total Monthly Cost

< $0.50

Daily Volume

~78 MB

Annual Storage

~28 GB

S3 Delivery Cost

Source	Destination	Daily Volume	Monthly Cost
WAF full request logs	S3	~50 MB	~$0.04
ALB access logs	S3	~5 MB	< $0.01
CloudFront logs	S3	~15 MB	~$0.01
S3 access logs	S3	~2 MB	< $0.01
Aurora PostgreSQL logs	CloudWatch	~5 MB	~$0.08
Valkey slow/engine logs	CloudWatch	~1 MB	~$0.02
Total			< $0.50/month

Why so cheap? S3 log delivery from AWS services (WAF, ALB, CloudFront, S3) has zero delivery cost — you only pay for storage. At ~72 MB/day to S3, annual storage is ~26 GB × $0.023/GB = $0.60/year for the first 90 days, then Glacier IR at $0.004/GB for months 3-12. CloudWatch ingestion ($0.50/GB) is the only real cost driver, and we only send low-volume logs there (Aurora + Valkey = ~6 MB/day = ~$0.10/month).

11. Athena — SQL Analytics Across All Logs

Amazon Athena is a serverless SQL engine that queries data directly in S3 — no database to provision, no data to move, no ETL. Point it at the log files, define a schema, and run standard SQL. Deployed June 4, 2026.

Workgroup

trinity-beast-analytics

Database

tbi_logs

Tables

Saved Queries

Cost per Query

~$0.01

Query Results

s3://.../athena-results/

11.1 Tables

Table	Source	Partitioned By	Key Fields
`waf_logs`	WAF full request logs	year / month / day / hour	action, httprequest.clientip, httprequest.uri, httprequest.country, terminatingruleid, labels
`alb_logs`	ALB access logs	year / month / day	request_url, elb_status_code, target_processing_time, client_ip, user_agent
`cloudfront_logs`	CloudFront standard logs	date column	cs_uri_stem, c_ip, sc_status, x_edge_location, x_edge_result_type, time_taken
`s3_access_logs`	S3 server access logs	—	operation, key, remote_ip, http_status, requester
`cloudtrail_logs`	CloudTrail API events	account / region / year / month / day	eventname, eventsource, sourceipaddress, useridentity, errorcode

Partition projection — all date-partitioned tables use Athena partition projection. This means new data is queryable the instant it arrives in S3. No manual MSCK REPAIR TABLE, no Glue crawlers, no partition maintenance. The schema tells Athena where to look based on the date path pattern.

11.2 Saved Queries (Ready to Run)

Nine pre-built queries are saved in the trinity-beast-analytics workgroup. Run them from the Athena console or CLI — just click "Saved queries" in the console.

Query Name	What It Answers
WAF: Blocks by Country (24h)	Top countries generating blocked requests today, grouped by WAF rule
WAF: Top Blocked IPs (24h)	IPs with the most blocks — identify repeat attackers and which rules caught them
WAF: Request Volume by Endpoint (24h)	Traffic distribution across API endpoints — which paths get the most hits
WAF: Trace IP (forensics)	Every request from a specific IP: time, action, URI, method, country. Replace `THE_IP` with the target
CloudTrail: API Calls by Service (24h)	Top AWS API calls grouped by service — spot unusual activity patterns
CloudTrail: Error Events (24h)	All API calls that returned errors today — permission denials, throttles, service errors
CloudFront: Top Requested Pages (24h)	Most popular website pages by request count, bytes served, and average load time
CloudFront: Traffic by Country (24h)	Website visitors by country using viewer IP geolocation — audience geography and traffic patterns
Cross-Source: IP Correlation	Find IPs that hit BOTH the API (WAF) and website (CloudFront) — reconnaissance detection via JOIN

11.3 Example Queries

Security — Who got blocked and why?

-- Top 30 blocked IPs with the rules that caught them
SELECT httprequest.clientip AS ip,
       httprequest.country AS country,
       count(*) AS blocks,
       array_agg(DISTINCT terminatingruleid) AS rules
FROM tbi_logs.waf_logs
WHERE action = 'BLOCK'
  AND year = '2026' AND month = '06' AND day = '04'
GROUP BY httprequest.clientip, httprequest.country
ORDER BY blocks DESC
LIMIT 30;

Performance — P95 latency by endpoint

-- API endpoint latency percentiles from ALB logs
SELECT request_url,
       count(*) AS requests,
       approx_percentile(target_processing_time, 0.5) AS p50,
       approx_percentile(target_processing_time, 0.95) AS p95,
       approx_percentile(target_processing_time, 0.99) AS p99
FROM tbi_logs.alb_logs
WHERE year = '2026' AND month = '06' AND day = '04'
  AND request_url LIKE '%/price%'
GROUP BY request_url
ORDER BY requests DESC;

Cross-Source — Reconnaissance detection

-- IPs that hit both the API AND the website (probing behavior)
SELECT w.ip, w.waf_requests, w.blocks, cf.cf_requests
FROM (
  SELECT httprequest.clientip AS ip,
         count(*) AS waf_requests,
         sum(CASE WHEN action='BLOCK' THEN 1 ELSE 0 END) AS blocks
  FROM tbi_logs.waf_logs
  WHERE year = '2026' AND month = '06' AND day = '04'
  GROUP BY httprequest.clientip
) w
JOIN (
  SELECT c_ip AS ip, count(*) AS cf_requests
  FROM tbi_logs.cloudfront_logs
  WHERE date = DATE '2026-06-04'
  GROUP BY c_ip
) cf ON w.ip = cf.ip
ORDER BY w.blocks DESC, w.waf_requests DESC
LIMIT 20;

Forensics — Full trace of a suspicious IP

-- Everything a specific IP did across all API endpoints today
SELECT from_unixtime(timestamp/1000) AS time,
       action,
       httprequest.uri,
       httprequest.httpmethod,
       terminatingruleid,
       httprequest.country
FROM tbi_logs.waf_logs
WHERE httprequest.clientip = '45.148.10.51'
  AND year = '2026' AND month = '06'
ORDER BY timestamp DESC
LIMIT 100;

Infrastructure — What AWS APIs are being called?

-- Top AWS API calls in our account today (spot automation vs manual)
SELECT eventsource, eventname, sourceipaddress, count(*) AS calls
FROM tbi_logs.cloudtrail_logs
WHERE account = '211998422884' AND region = 'us-east-2'
  AND year = '2026' AND month = '06' AND day = '04'
GROUP BY eventsource, eventname, sourceipaddress
ORDER BY calls DESC
LIMIT 30;

11.4 How to Use

From the AWS Console

Open Athena → select workgroup trinity-beast-analytics → database tbi_logs → click "Saved queries" for the 9 pre-built queries, or write your own SQL in the editor. Results appear in seconds.

From the CLI

Submit a query and fetch results:

# Run a query
QID=$(aws athena start-query-execution \
  --query-string "SELECT action, count(*) FROM tbi_logs.waf_logs WHERE year='2026' AND month='06' AND day='04' GROUP BY action" \
  --work-group trinity-beast-analytics \
  --region us-east-2 --query 'QueryExecutionId' --output text)

# Wait for completion (typically 1-3 seconds)
aws athena get-query-execution --query-execution-id $QID \
  --region us-east-2 --query 'QueryExecution.Status.State' --output text

# Get results
aws athena get-query-results --query-execution-id $QID \
  --region us-east-2 --output json | jq '.ResultSet.Rows[] | .Data | map(.VarCharValue) | join(" | ")'

11.5 Cost & Limits

Dimension	Value
Price per query	$5 per TB scanned (minimum 10 MB charge per query)
Current daily log volume	~78 MB/day → ~2.4 GB/month
Cost of scanning full month	~$0.012
Query cost limit (workgroup)	1 GB per query (safety guardrail)
Partition benefit	Querying one day scans only that day's folder — not the entire bucket
Compression benefit	Gzipped files = less bytes scanned = lower cost per query

Cost optimization tip: Always include partition filters (year, month, day) in your WHERE clause. Without them, Athena scans ALL data in the table. With them, it only reads the specific folders you need — reducing cost and execution time by orders of magnitude.

11.6 Pattern Recognition — What Athena Reveals

The real power isn't in individual queries — it's in combining signals across log sources to surface patterns invisible to any single source. These are the questions Athena answers that nothing else in the stack can:

Pattern	How to Detect	Sources
Coordinated reconnaissance	Same IP appears in CloudFront (scouting docs), WAF (probing API), and gets blocked. The cross-source correlation query surfaces these instantly.	WAF + CloudFront JOIN
Latency regression	ALB logs contain per-request `target_processing_time`. Compute P95/P99 by endpoint across days to spot creeping degradation before users notice.	ALB logs, GROUP BY day
Geographic traffic shifts	CloudFront logs include edge location + viewer country. A sudden spike from an unusual region could signal DDoS ramp-up, bot farm, or a new market discovering the product.	CloudFront logs
Rate limit evasion	WAF logs show high-volume IPs that stay just below the 2000/5min threshold. They're not blocked but they're up to something — distributed scraping across multiple keys.	WAF logs, windowed aggregation
Incident timeline reconstruction	"What happened between 2:15 and 2:45 AM?" — a single query across WAF + CloudTrail + ALB shows: who called what, from where, what succeeded, what failed, in chronological order.	WAF + CloudTrail + ALB
Bot fingerprinting	WAF logs include JA3 TLS fingerprints. Group blocked requests by fingerprint to identify bot frameworks vs. legitimate browsers — the same fingerprint across 50 IPs is a botnet.	WAF logs, JA3 field
Infrastructure drift detection	CloudTrail error events grouped by service — sudden permission denials or throttling reveals IAM policy drift, resource limits, or configuration changes you didn't make.	CloudTrail errors
Content popularity evolution	CloudFront logs show which docs/pages are hot, which are dead, and how the distribution shifts over time. Informs what to translate first, what to promote, where to invest.	CloudFront logs, daily aggregation

The multiplier effect: Each table alone is useful. JOINing them is where the real intelligence lives. An IP that appears in only one source is noise. An IP that appears in WAF blocks, CloudFront requests, AND CloudTrail (trying to call AWS APIs from our VPC's NAT gateway IP) is a story — and Athena tells it in one query.

11.7 Future Enhancements

S3 Tables (Apache Iceberg): If query volume or data size grows, migrate table definitions to S3 Tables for automatic compaction, 3× faster queries, and time-travel capability. Same SQL, same Athena — just faster underneath.
CloudWatch Logs Insights: Already available on all CloudWatch groups — structured query language for real-time pattern analysis on application logs.
Athena Views: Create pre-joined views (WAF + CloudFront by IP, CloudTrail errors by service) for one-click analysis.
Scheduled Queries: Athena can run queries on a schedule and output to S3 — automated daily security reports without any Lambda code.
QuickSight Integration: If visual dashboards are needed, QuickSight connects directly to Athena for interactive BI over all log data.