Before June 4, 2026, the Trinity Beast Infrastructure had a logging blind spot. Application code logged to CloudWatch — that part worked. But the surrounding infrastructure was largely invisible:
/price endpoints?" — no per-request latency dataThe logging architecture splits into three tiers based on query pattern and retention needs:
flowchart TB
%% Sources
subgraph sources["Log Sources"]
direction LR
ALB["ALB
Trinity-Beast-TCP-ALB"]
CF["CloudFront
E110PRKEIYQVLL"]
WAF["WAF
trinity-beast-api-waf"]
S3SRC["S3 Website
trinity-beast-website-east2"]
ECS["ECS Containers
Main · Mirror · LRS · Webhook"]
LAMBDA["Lambda Functions
14 functions"]
AURORA["Aurora PostgreSQL
Writer + Reader"]
VALKEY["Valkey 7.2
cache.r7g.2xlarge"]
VPC["VPC Flow Logs
2 VPCs"]
CT["CloudTrail
Multi-region"]
end
%% Tier 1 - S3
subgraph tier1["TIER 1 — S3: Long-Term Archive + Bulk Query"]
direction TB
BUCKET["aws-waf-logs-trinity-beast"]
WAF_PREFIX["AWSLogs/.../WAFLogs/
JSON · gzipped · every 5 min"]
ALB_PREFIX["alb/
space-delimited · gzipped · every 5 min"]
CF_PREFIX["cloudfront/
tab-delimited · gzipped · every few min"]
S3_PREFIX["s3-access/
space-delimited · best effort"]
LIFECYCLE["Lifecycle: Standard 0-90d → Glacier IR 90-365d → Delete"]
end
%% Tier 2 - CloudWatch
subgraph tier2["TIER 2 — CloudWatch: Real-Time + Instant Search"]
direction TB
CW_ECS["/aws/ecs/trinity-beast
4 containers · 1 group · UME self-identifies"]
CW_LAMBDA["/aws/lambda/*
14 function log groups"]
CW_AURORA["/aws/rds/cluster/.../postgresql
Slow queries · Lock waits · Errors"]
CW_VALKEY["/aws/elasticache/.../slow-log + engine-log
Slow commands · Engine events"]
CW_VPC["/aws/vpc/trinity-beast-flowlogs
Both VPCs · Accept/Reject"]
CW_CT["/aws/cloudtrail/trinity-beast
30-day window · S3 has full archive"]
CW_INSIGHTS["Container Insights + RDSOSMetrics"]
RETENTION["All groups: 90-day retention"]
end
%% Tier 3 - Application State
subgraph tier3["TIER 3 — Valkey + Aurora: Application-Level State"]
direction TB
VK_OPS["autoops:actions:log · threats:daily
Autonomous actions · Threat summaries"]
VK_HONEY["honeypot:log · honeypot:ip:*
Trap hits · Per-IP forensics"]
VK_TX["tx:job:* · tx:history
Translation job progress"]
VK_REPORT["report:text:* · docs:session:log
Daily reports · Session history"]
AU_EVENTS["translation_job_events
usage_logs · support_tickets"]
AU_CRON["cron.job_run_details
pg_cron execution history"]
end
%% Connections - Tier 1
WAF -->|"full request logs"| WAF_PREFIX
ALB -->|"access logs"| ALB_PREFIX
CF -->|"standard logs"| CF_PREFIX
S3SRC -->|"server access"| S3_PREFIX
WAF_PREFIX --> BUCKET
ALB_PREFIX --> BUCKET
CF_PREFIX --> BUCKET
S3_PREFIX --> BUCKET
BUCKET --> LIFECYCLE
%% Connections - Tier 2
ECS -->|"stdout/stderr"| CW_ECS
LAMBDA -->|"execution logs"| CW_LAMBDA
AURORA -->|"pg logs export"| CW_AURORA
VALKEY -->|"slow + engine"| CW_VALKEY
VPC -->|"flow logs"| CW_VPC
CT -->|"API events"| CW_CT
ECS -->|"metrics"| CW_INSIGHTS
AURORA -->|"OS metrics"| CW_INSIGHTS
%% Connections - Tier 3
ECS -->|"app writes"| VK_OPS
ECS -->|"honeypot hits"| VK_HONEY
ECS -->|"job state"| VK_TX
ECS -->|"reports"| VK_REPORT
ECS -->|"audit rows"| AU_EVENTS
AURORA -->|"cron results"| AU_CRON
%% Also CloudTrail to S3
CT -->|"archive"| BUCKET
%% Styling
classDef s3Style fill:#1a365d,stroke:#60a5fa,color:#e2e8f0
classDef cwStyle fill:#14532d,stroke:#10b981,color:#e2e8f0
classDef appStyle fill:#4c1d95,stroke:#a78bfa,color:#e2e8f0
classDef sourceStyle fill:#1e293b,stroke:#FF9900,color:#e2e8f0
classDef bucketStyle fill:#0f172a,stroke:#FF9900,color:#FF9900
class WAF_PREFIX,ALB_PREFIX,CF_PREFIX,S3_PREFIX,LIFECYCLE s3Style
class CW_ECS,CW_LAMBDA,CW_AURORA,CW_VALKEY,CW_VPC,CW_CT,CW_INSIGHTS,RETENTION cwStyle
class VK_OPS,VK_HONEY,VK_TX,VK_REPORT,AU_EVENTS,AU_CRON appStyle
class ALB,CF,WAF,S3SRC,ECS,LAMBDA,AURORA,VALKEY,VPC,CT sourceStyle
class BUCKET bucketStyle
A single API request generates log entries across multiple tiers. This diagram traces a request from edge to database, showing where each log event lands:
sequenceDiagram
participant Client as Client (Browser/SDK)
participant CF as CloudFront
participant WAF as WAF
participant ALB as ALB
participant ECS as ECS Container
participant Aurora as Aurora
participant Valkey as Valkey
participant S3Log as S3 Logs Bucket
Note over Client,S3Log: A single GET /price?asset=BTC request
Client->>CF: HTTPS request
Note right of CF: 📝 CloudFront log → S3
cloudfront/ prefix
(edge, IP, URI, cache hit/miss)
CF->>WAF: Forward to origin
Note right of WAF: 📝 WAF log → S3
AWSLogs/.../WAFLogs/
(IP, headers, rule match, ALLOW)
WAF->>ALB: Passed rules
Note right of ALB: 📝 ALB access log → S3
alb/ prefix
(latency, status, target IP)
ALB->>ECS: Route to healthy target
Note right of ECS: 📝 Container log → CloudWatch
/aws/ecs/trinity-beast
(UME: cluster_node, endpoint, api_key_id)
ECS->>Valkey: Check price cache
Note right of Valkey: (no log unless slow command)
ECS->>Aurora: Query if cache miss
Note right of Aurora: 📝 If >1s → PostgreSQL log
/aws/rds/.../postgresql
ECS-->>Client: UME Response (200)
Note over Client,S3Log: Result: 4-6 log entries for ONE request across 3-4 destinations
Single lifecycle policy. Single IAM scope. Single place to point Athena. One less thing to remember. The bucket is named aws-waf-logs-trinity-beast because WAF enforces a naming convention (aws-waf-logs-* prefix required). Rather than fight it, we made the WAF-compliant name the home for everything.
Cost. CloudWatch Logs ingestion is $0.50/GB. S3 Standard is $0.023/GB/month for storage + $0 for delivery. At ~50 MB/day of WAF logs, S3 saves ~$0.70/month vs CloudWatch. More importantly, S3 logs are Athena-queryable — we can run SQL across millions of WAF events without building a pipeline.
All four LPO containers (Main, Mirror, LRS, Webhook) run the same binary and log the same way. Every structured log line carries UME fields: cluster_node (which container), agent_profile_arn (which actor), endpoint (what path), ip_address (who called). There is zero ambiguity about origin. Keeping them in one log group means one search covers the entire cluster — no need to check each node separately when hunting for an error or tracing a request.
Instant search. When a Lambda errors or a container crashes, you need the answer in seconds — not minutes downloading from S3. CloudWatch's filter-log-events gives sub-second search across application logs. The tradeoff is worth it: application logs are low-volume and high-urgency; infrastructure logs are high-volume and low-urgency.
aws-waf-logs-trinity-beast| Prefix | Source | Format | Delivery |
|---|---|---|---|
AWSLogs/211998422884/WAFLogs/us-east-2/trinity-beast-api-waf/ | WAF (ALB) | JSON (gzipped), one event per line | Every 5 minutes |
alb/AWSLogs/211998422884/elasticloadbalancing/us-east-2/ | ALB | Space-delimited (gzipped) | Every 5 minutes |
cloudfront/ | CloudFront | Tab-delimited (gzipped), W3C extended format | Every few minutes |
s3-access/ | S3 Server Access | Space-delimited (not compressed) | Best-effort (minutes) |
The bucket policy grants write access to four AWS service principals:
| Principal | Access | Path |
|---|---|---|
delivery.logs.amazonaws.com | PutObject | AWSLogs/211998422884/* (WAF) |
arn:aws:iam::033677994240:root | PutObject | alb/* (ALB, us-east-2 ELB account) |
delivery.logs.amazonaws.com | PutObject | alb/*, cloudfront/* |
logging.s3.amazonaws.com | PutObject | s3-access/* |
| Age | Storage Class | Cost/GB/Month | Access Speed |
|---|---|---|---|
| 0–90 days | S3 Standard | $0.023 | Instant (milliseconds) |
| 90–365 days | Glacier Instant Retrieval | $0.004 | Instant (milliseconds) |
| >365 days | Deleted | — | — |
All application-level logs live in CloudWatch for instant search. Every group has explicit 90-day retention — no infinite growth.
/aws/ecs/trinity-beast). There is no need to separate them — every log line self-identifies via the Unified Messaging Envelope (UME): cluster_node tells you which container (BeastMain, BeastMirror, BeastLRS, BeastWebhook), agent_profile_arn tells you which actor produced it, and endpoint tells you what was being served. Stream prefixes exist for AWS's benefit (task routing), but for searching, you filter on UME fields — not stream names.
| Service | Log Group | Stream Prefix | UME cluster_node |
|---|---|---|---|
| BeastMain | /aws/ecs/trinity-beast | main/ | BeastMain |
| BeastMirror | /aws/ecs/trinity-beast | mirror/ | BeastMirror |
| BeastLRS | /aws/ecs/trinity-beast | lrs/ | BeastLRS |
| BeastWebhook | /aws/ecs/trinity-beast | webhook/ | BeastWebhook |
| BeastTranslate | /ecs/tbi-translate-worker | service/tbi-translate-worker/ | — |
| BeastReconciler | /aws/ecs/trinity-beast-sync | sync/trinity-beast-sync-job/ | — |
The translate worker and sync job have separate log groups because they are different binaries (Python and Go respectively) with different lifecycles — they are not LPO containers. But the four LPO/LRS nodes are the same binary running in parallel, differentiated only by SERVER_TYPE and CLUSTER_NODE environment variables. UME makes separation unnecessary — search across all four at once and filter by the fields that matter.
flowchart LR
subgraph containers["4 Containers — Same Binary, Same Log Group"]
M["BeastMain
APP_REPORT_SERVER"]
R["BeastMirror
APP_REPORT_SERVER"]
L["BeastLRS
APP_REPORT_SERVER"]
W["BeastWebhook
WEBHOOK_SERVER"]
end
subgraph loggroup["/aws/ecs/trinity-beast"]
LOG["Unified Stream
All 4 containers interleaved"]
end
subgraph ume["UME Fields — Self-Identification"]
CN["cluster_node:
BeastMain | BeastMirror | BeastLRS | BeastWebhook"]
AP["agent_profile_arn:
tbi | webhook-engine | rhema | ..."]
EP["endpoint:
/price | /reports | /admin/..."]
IP["ip_address:
client source IP"]
end
M --> LOG
R --> LOG
L --> LOG
W --> LOG
LOG --> CN
LOG --> AP
LOG --> EP
LOG --> IP
classDef containerStyle fill:#1e293b,stroke:#FF9900,color:#e2e8f0
classDef logStyle fill:#14532d,stroke:#10b981,color:#e2e8f0
classDef umeStyle fill:#1e1b4b,stroke:#a78bfa,color:#e2e8f0
class M,R,L,W containerStyle
class LOG logStyle
class CN,AP,EP,IP umeStyle
# Search ALL containers at once — UME self-reports which node answered
aws logs filter-log-events --log-group-name "/aws/ecs/trinity-beast" \
--start-time $(date -v-1H +%s)000 --filter-pattern "BeastMain" \
--region us-east-2 --query 'events[*].message' --output json
# Find errors across all 4 nodes simultaneously
aws logs filter-log-events --log-group-name "/aws/ecs/trinity-beast" \
--start-time $(date -v-1H +%s)000 --filter-pattern "ERROR" \
--region us-east-2 --query 'events[*].message' --output json
# Filter by agent actor (e.g., only webhook-engine responses)
aws logs filter-log-events --log-group-name "/aws/ecs/trinity-beast" \
--start-time $(date -v-1H +%s)000 --filter-pattern "webhook-engine" \
--region us-east-2 --query 'events[*].message' --output json
| Function | Log Group | Purpose |
|---|---|---|
trinity-beast-receipt | /aws/lambda/trinity-beast-receipt | Stripe receipt processing |
trinity-beast-queued-writer | /aws/lambda/trinity-beast-queued-writer | SQS → Aurora batch inserts |
tbi-ops-notify | /aws/lambda/tbi-ops-notify | Formatted SES notifications |
tbi-ops-self-heal | /aws/lambda/tbi-ops-self-heal | ECS task restart automation |
tbi-ops-waf-action | /aws/lambda/tbi-ops-waf-action | WAF rule management |
tbi-ops-honeypot-processor | /aws/lambda/tbi-ops-honeypot-processor | Honeypot queue → WAF blocks |
tbi-ops-bedrock-analyze | /aws/lambda/tbi-ops-bedrock-analyze | AI threat correlation |
tbi-rhema-support | /aws/lambda/tbi-rhema-support | AI support assistant |
tbi-ops-digest | /aws/lambda/tbi-ops-digest | Daily/weekly operational digest |
tbi-translate-deploy | /aws/lambda/tbi-translate-deploy | CloudFront invalidation |
tbi-translate-finalize | /aws/lambda/tbi-translate-finalize | Search rebuild + notification |
tbi-translate-batch-prepare | /aws/lambda/tbi-translate-batch-prepare | Batch JSONL preparation |
tbi-translate-batch-submit | /aws/lambda/tbi-translate-batch-submit | Bedrock batch job submission |
| Log Group | Content | Retention |
|---|---|---|
/aws/cloudtrail/trinity-beast | All AWS API calls (multi-region) | 30 days (S3 archive is indefinite) |
/aws/vpc/trinity-beast-flowlogs | Network traffic for both VPCs | 90 days |
/aws/ecs/containerinsights/.../performance | ECS CPU, memory, network per task | 90 days |
RDSOSMetrics | Aurora OS-level metrics (every 30s) | 90 days |
/aws/rds/cluster/trinity-beast-aurora-cluster/postgresql | Slow queries (>1s), lock waits, errors | 90 days |
/aws/elasticache/trinity-beast-cache/slow-log | Valkey commands exceeding threshold | 90 days |
/aws/elasticache/trinity-beast-cache/engine-log | Valkey engine events (startup, failover) | 90 days |
Beyond CloudWatch export, the application writes structured operational data directly to Valkey as keys. These function as real-time logs for the AutoOps system.
| Key Pattern | Type | What It Records | TTL |
|---|---|---|---|
autoops:actions:log | Sorted Set | Every autonomous action (self-heals, WAF blocks, notifications) | Permanent |
autoops:threats:daily | String (JSON) | Today's AI-generated threat summary | Overwritten daily |
honeypot:log | Sorted Set | Chronological log of all honeypot trap hits | Permanent |
honeypot:ip:<ip> | Hash | Per-IP: first_seen, last_seen, hit_count, paths | Permanent |
tx:job:<id> | Hash | Translation job state (live progress per pair) | 7 days |
report:text:YYYY-MM-DD | String | Plain-text daily report for newsletter | 30 days |
docs:session:log | List | Session close entries (date, session, summary) | Permanent |
Enabled June 4, 2026. Aurora exports slow queries and errors to CloudWatch automatically. The instance parameter group controls what gets logged:
| Parameter | Value | Effect |
|---|---|---|
log_min_duration_statement | 1000 | Log queries taking longer than 1 second |
log_temp_files | 0 | Log all temp file usage (sorts spilling to disk) |
log_lock_waits | 1 | Log lock wait events |
track_io_timing | on | Include I/O timing in EXPLAIN output |
idle_in_transaction_session_timeout | 300000 | Kill idle-in-transaction sessions after 5 min |
Scheduled job execution history lives in Aurora itself — not CloudWatch. Query via:
SELECT r.jobid, j.jobname, r.start_time, r.end_time, r.status, r.return_message
FROM cron.job_run_details r JOIN cron.job j ON r.jobid = j.jobid
ORDER BY r.start_time DESC LIMIT 20;
| Table | Purpose | Retention |
|---|---|---|
translation_job_events | Every state transition for every translation job | Permanent |
usage_logs | Every API call (raw request log) | 93 days in Valkey, permanent in Aurora |
support_tickets | Support ticket history | 1095 days (3 years) |
cron.job_run_details | pg_cron execution results | ~7 days (pg_cron internal cleanup) |
| Tier | Retention | Storage | Applies To |
|---|---|---|---|
| CloudWatch (Application) | 90 days | CloudWatch Logs | All 18 Lambda + 4 ECS groups |
| CloudWatch (Infrastructure) | 90 days | CloudWatch Logs | VPC Flow Logs, Container Insights, RDSOSMetrics, Aurora PG, Valkey |
| CloudWatch (CloudTrail) | 30 days | CloudWatch Logs | API audit (real-time search); S3 archive handles long-term |
| S3 (Standard) | 0–90 days | S3 Standard | WAF, ALB, CloudFront, S3 access logs |
| S3 (Glacier IR) | 90–365 days | Glacier Instant Retrieval | Same — automatic transition |
| S3 (CloudTrail) | Indefinite | S3 Standard | CloudTrail raw archive (no lifecycle rule) |
| Valkey | 7–30 days | In-memory | Operational state (TTL-managed) |
| Aurora | Permanent | Aurora storage | Usage logs, tickets, job events |
# Find errors in any Lambda (last hour)
aws logs filter-log-events --log-group-name "/aws/lambda/tbi-ops-notify" \
--start-time $(date -v-1H +%s)000 --filter-pattern "ERROR" \
--region us-east-2 --query 'events[*].message' --output json
# Slow Aurora queries (last 24h)
aws logs filter-log-events \
--log-group-name "/aws/rds/cluster/trinity-beast-aurora-cluster/postgresql" \
--start-time $(date -v-24H +%s)000 --filter-pattern "duration" \
--region us-east-2 --query 'events[*].message' --output json
# ECS container output for sync job (latest run)
STREAM=$(aws logs describe-log-streams --log-group-name "/aws/ecs/trinity-beast-sync" \
--order-by LastEventTime --descending --limit 1 \
--region us-east-2 --query 'logStreams[0].logStreamName' --output text)
aws logs get-log-events --log-group-name "/aws/ecs/trinity-beast-sync" \
--log-stream-name "$STREAM" --limit 30 \
--region us-east-2 --query 'events[*].message' --output json
# WAF: Get recent blocked requests
latest=$(aws s3 ls "s3://aws-waf-logs-trinity-beast/AWSLogs/211998422884/WAFLogs/\
us-east-2/trinity-beast-api-waf/$(date -u +%Y/%m/%d/%H)/" --region us-east-2 | tail -1 | awk '{print $4}')
aws s3 cp "s3://aws-waf-logs-trinity-beast/AWSLogs/211998422884/WAFLogs/\
us-east-2/trinity-beast-api-waf/$(date -u +%Y/%m/%d/%H)/$latest" - \
--region us-east-2 | gunzip | jq -c 'select(.action=="BLOCK") | {ip: .httpRequest.clientIp, uri: .httpRequest.uri, rule: .terminatingRuleId}'
# ALB: Check latency for a specific endpoint
aws s3 cp "s3://aws-waf-logs-trinity-beast/alb/AWSLogs/211998422884/\
elasticloadbalancing/us-east-2/$(date +%Y/%m/%d)/<file>.log.gz" - \
--region us-east-2 | gunzip | awk '{print $6, $13}' | grep "/price"
# CloudFront: Top requested URIs
aws s3 cp "s3://aws-waf-logs-trinity-beast/cloudfront/<file>.gz" - \
--region us-east-2 | gunzip | awk '{print $8}' | sort | uniq -c | sort -rn | head -20
# Recent autonomous actions
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" -H "Content-Type: application/json" \
-d '{"command":"ZREVRANGEBYSCORE autoops:actions:log +inf -inf LIMIT 0 10"}' \
"$LPO_BASE/admin/valkey"
# Today's threat assessment
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" -H "Content-Type: application/json" \
-d '{"command":"GET autoops:threats:daily"}' "$LPO_BASE/admin/valkey"
# Honeypot activity (last 10 hits)
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" -H "Content-Type: application/json" \
-d '{"command":"ZREVRANGEBYSCORE honeypot:log +inf -inf LIMIT 0 10"}' \
"$LPO_BASE/admin/valkey"
| I Want To See... | Go To |
|---|---|
| What an ECS container printed | CloudWatch: /aws/ecs/trinity-beast (filter by stream prefix) |
| Why a Lambda failed | CloudWatch: /aws/lambda/<function-name> (filter for ERROR) |
| Every HTTP request to the API | S3: aws-waf-logs-trinity-beast/AWSLogs/.../WAFLogs/ |
| Which WAF rule blocked what | S3: WAF logs → filter for action: "BLOCK" |
| Per-request latency to the API | S3: aws-waf-logs-trinity-beast/alb/ |
| Every website visitor request | S3: aws-waf-logs-trinity-beast/cloudfront/ |
| Who accessed S3 objects | S3: aws-waf-logs-trinity-beast/s3-access/ |
| Network traffic patterns | CloudWatch: /aws/vpc/trinity-beast-flowlogs |
| Slow database queries | CloudWatch: /aws/rds/cluster/.../postgresql |
| Slow Valkey commands | CloudWatch: /aws/elasticache/.../slow-log |
| Who called what AWS API | CloudWatch: /aws/cloudtrail/trinity-beast |
| Historical AWS API calls | S3: aws-cloudtrail-logs-211998422884-879cb71c |
| What AutoOps did | Valkey: autoops:actions:log |
| Honeypot hits | Valkey: honeypot:log |
| Translation job trace | Aurora: translation_job_events table |
| pg_cron job results | Aurora: cron.job_run_details table |
| Session history & decisions | S3: daily-reports/tbi-ops-* or ~/daily-reports/ |
| Source | Destination | Daily Volume | Monthly Cost |
|---|---|---|---|
| WAF full request logs | S3 | ~50 MB | ~$0.04 |
| ALB access logs | S3 | ~5 MB | < $0.01 |
| CloudFront logs | S3 | ~15 MB | ~$0.01 |
| S3 access logs | S3 | ~2 MB | < $0.01 |
| Aurora PostgreSQL logs | CloudWatch | ~5 MB | ~$0.08 |
| Valkey slow/engine logs | CloudWatch | ~1 MB | ~$0.02 |
| Total | < $0.50/month | ||
Amazon Athena is a serverless SQL engine that queries data directly in S3 — no database to provision, no data to move, no ETL. Point it at the log files, define a schema, and run standard SQL. Deployed June 4, 2026.
trinity-beast-analyticstbi_logss3://.../athena-results/| Table | Source | Partitioned By | Key Fields |
|---|---|---|---|
waf_logs | WAF full request logs | year / month / day / hour | action, httprequest.clientip, httprequest.uri, httprequest.country, terminatingruleid, labels |
alb_logs | ALB access logs | year / month / day | request_url, elb_status_code, target_processing_time, client_ip, user_agent |
cloudfront_logs | CloudFront standard logs | date column | cs_uri_stem, c_ip, sc_status, x_edge_location, x_edge_result_type, time_taken |
s3_access_logs | S3 server access logs | — | operation, key, remote_ip, http_status, requester |
cloudtrail_logs | CloudTrail API events | account / region / year / month / day | eventname, eventsource, sourceipaddress, useridentity, errorcode |
MSCK REPAIR TABLE, no Glue crawlers, no partition maintenance. The schema tells Athena where to look based on the date path pattern.
Nine pre-built queries are saved in the trinity-beast-analytics workgroup. Run them from the Athena console or CLI — just click "Saved queries" in the console.
| Query Name | What It Answers |
|---|---|
| WAF: Blocks by Country (24h) | Top countries generating blocked requests today, grouped by WAF rule |
| WAF: Top Blocked IPs (24h) | IPs with the most blocks — identify repeat attackers and which rules caught them |
| WAF: Request Volume by Endpoint (24h) | Traffic distribution across API endpoints — which paths get the most hits |
| WAF: Trace IP (forensics) | Every request from a specific IP: time, action, URI, method, country. Replace THE_IP with the target |
| CloudTrail: API Calls by Service (24h) | Top AWS API calls grouped by service — spot unusual activity patterns |
| CloudTrail: Error Events (24h) | All API calls that returned errors today — permission denials, throttles, service errors |
| CloudFront: Top Requested Pages (24h) | Most popular website pages by request count, bytes served, and average load time |
| CloudFront: Traffic by Country (24h) | Website visitors by country using viewer IP geolocation — audience geography and traffic patterns |
| Cross-Source: IP Correlation | Find IPs that hit BOTH the API (WAF) and website (CloudFront) — reconnaissance detection via JOIN |
-- Top 30 blocked IPs with the rules that caught them
SELECT httprequest.clientip AS ip,
httprequest.country AS country,
count(*) AS blocks,
array_agg(DISTINCT terminatingruleid) AS rules
FROM tbi_logs.waf_logs
WHERE action = 'BLOCK'
AND year = '2026' AND month = '06' AND day = '04'
GROUP BY httprequest.clientip, httprequest.country
ORDER BY blocks DESC
LIMIT 30;
-- API endpoint latency percentiles from ALB logs
SELECT request_url,
count(*) AS requests,
approx_percentile(target_processing_time, 0.5) AS p50,
approx_percentile(target_processing_time, 0.95) AS p95,
approx_percentile(target_processing_time, 0.99) AS p99
FROM tbi_logs.alb_logs
WHERE year = '2026' AND month = '06' AND day = '04'
AND request_url LIKE '%/price%'
GROUP BY request_url
ORDER BY requests DESC;
-- IPs that hit both the API AND the website (probing behavior)
SELECT w.ip, w.waf_requests, w.blocks, cf.cf_requests
FROM (
SELECT httprequest.clientip AS ip,
count(*) AS waf_requests,
sum(CASE WHEN action='BLOCK' THEN 1 ELSE 0 END) AS blocks
FROM tbi_logs.waf_logs
WHERE year = '2026' AND month = '06' AND day = '04'
GROUP BY httprequest.clientip
) w
JOIN (
SELECT c_ip AS ip, count(*) AS cf_requests
FROM tbi_logs.cloudfront_logs
WHERE date = DATE '2026-06-04'
GROUP BY c_ip
) cf ON w.ip = cf.ip
ORDER BY w.blocks DESC, w.waf_requests DESC
LIMIT 20;
-- Everything a specific IP did across all API endpoints today
SELECT from_unixtime(timestamp/1000) AS time,
action,
httprequest.uri,
httprequest.httpmethod,
terminatingruleid,
httprequest.country
FROM tbi_logs.waf_logs
WHERE httprequest.clientip = '45.148.10.51'
AND year = '2026' AND month = '06'
ORDER BY timestamp DESC
LIMIT 100;
-- Top AWS API calls in our account today (spot automation vs manual)
SELECT eventsource, eventname, sourceipaddress, count(*) AS calls
FROM tbi_logs.cloudtrail_logs
WHERE account = '211998422884' AND region = 'us-east-2'
AND year = '2026' AND month = '06' AND day = '04'
GROUP BY eventsource, eventname, sourceipaddress
ORDER BY calls DESC
LIMIT 30;
Open Athena → select workgroup trinity-beast-analytics → database tbi_logs → click "Saved queries" for the 9 pre-built queries, or write your own SQL in the editor. Results appear in seconds.
Submit a query and fetch results:
# Run a query
QID=$(aws athena start-query-execution \
--query-string "SELECT action, count(*) FROM tbi_logs.waf_logs WHERE year='2026' AND month='06' AND day='04' GROUP BY action" \
--work-group trinity-beast-analytics \
--region us-east-2 --query 'QueryExecutionId' --output text)
# Wait for completion (typically 1-3 seconds)
aws athena get-query-execution --query-execution-id $QID \
--region us-east-2 --query 'QueryExecution.Status.State' --output text
# Get results
aws athena get-query-results --query-execution-id $QID \
--region us-east-2 --output json | jq '.ResultSet.Rows[] | .Data | map(.VarCharValue) | join(" | ")'
| Dimension | Value |
|---|---|
| Price per query | $5 per TB scanned (minimum 10 MB charge per query) |
| Current daily log volume | ~78 MB/day → ~2.4 GB/month |
| Cost of scanning full month | ~$0.012 |
| Query cost limit (workgroup) | 1 GB per query (safety guardrail) |
| Partition benefit | Querying one day scans only that day's folder — not the entire bucket |
| Compression benefit | Gzipped files = less bytes scanned = lower cost per query |
year, month, day) in your WHERE clause. Without them, Athena scans ALL data in the table. With them, it only reads the specific folders you need — reducing cost and execution time by orders of magnitude.
The real power isn't in individual queries — it's in combining signals across log sources to surface patterns invisible to any single source. These are the questions Athena answers that nothing else in the stack can:
| Pattern | How to Detect | Sources |
|---|---|---|
| Coordinated reconnaissance | Same IP appears in CloudFront (scouting docs), WAF (probing API), and gets blocked. The cross-source correlation query surfaces these instantly. | WAF + CloudFront JOIN |
| Latency regression | ALB logs contain per-request target_processing_time. Compute P95/P99 by endpoint across days to spot creeping degradation before users notice. | ALB logs, GROUP BY day |
| Geographic traffic shifts | CloudFront logs include edge location + viewer country. A sudden spike from an unusual region could signal DDoS ramp-up, bot farm, or a new market discovering the product. | CloudFront logs |
| Rate limit evasion | WAF logs show high-volume IPs that stay just below the 2000/5min threshold. They're not blocked but they're up to something — distributed scraping across multiple keys. | WAF logs, windowed aggregation |
| Incident timeline reconstruction | "What happened between 2:15 and 2:45 AM?" — a single query across WAF + CloudTrail + ALB shows: who called what, from where, what succeeded, what failed, in chronological order. | WAF + CloudTrail + ALB |
| Bot fingerprinting | WAF logs include JA3 TLS fingerprints. Group blocked requests by fingerprint to identify bot frameworks vs. legitimate browsers — the same fingerprint across 50 IPs is a botnet. | WAF logs, JA3 field |
| Infrastructure drift detection | CloudTrail error events grouped by service — sudden permission denials or throttling reveals IAM policy drift, resource limits, or configuration changes you didn't make. | CloudTrail errors |
| Content popularity evolution | CloudFront logs show which docs/pages are hot, which are dead, and how the distribution shifts over time. Informs what to translate first, what to promote, where to invest. | CloudFront logs, daily aggregation |