The Trinity Beast — TBTS Data Dictionary

Translation Engine Tables
1. 1. translation_jobs — Translation job orchestration (30 columns)
2. 2. translation_job_events — Translation job audit log (7 columns)
3. 3. translation_quotes — Customer-facing quotes with Stripe payment (32 columns)
4. 4. translation_parameters — Cost, pricing, and operational parameters (6 columns, 69 rows)
5. 5. bedrock_model_catalog — Multi-provider Bedrock model registry (12 columns, 20 rows)
6. 6. translation_tasks — Task Organizer parent-child orchestration (15 columns)
7. 7. language_profiles — Per-language translation configuration with auto-tuning (16 columns)
Reference

1. translation_jobs

Purpose: Translation job orchestration. Tracks every translation batch submitted via the admin API — documents, languages, progress, costs, and Step Function state.

30 columns

Column	Data Type	Default	Nullable	Description
`id` PK	`uuid`	gen_random_uuid()	NO	Internal UUID (auto-generated)
`job_id`	`varchar(64)`	—	NO	Public job identifier (unique, used in API responses)
`task_id` FK → translation_tasks.task_id	`varchar(64)`	—	YES	Parent task identifier (NULL for standalone jobs submitted via `POST /admin/translate`). Partial index on non-NULL values.
`state`	`varchar(20)`	queued	NO	Job state: queued, running, completed, failed, cancelled
`submitted_at`	`timestamptz`	now()	NO	When the job was submitted
`started_at`	`timestamptz`	—	`YES`	When processing began
`completed_at`	`timestamptz`	—	`YES`	When processing finished
`submitted_by`	`varchar(50)`	admin-api	`YES`	Who submitted: admin-api, retry, cli
`docs`	`jsonb`	—	NO	Array of document filenames to translate (JSONB)
`langs`	`jsonb`	—	NO	Array of target language codes (JSONB)
`options`	`jsonb`	{}	`YES`	Job options as JSON (chunk size overrides, etc.)
`progress`	`jsonb`	{}	`YES`	Per-pair progress tracking as JSON
`errors`	`jsonb`	[]	`YES`	Array of error details for failed pairs (JSONB)
`bedrock_cost_usd`	`numeric`	0	`YES`	Total Bedrock API cost for this job in USD
`bedrock_invocations`	`integer`	0	`YES`	Total Bedrock API calls made
`total_chunks`	`integer`	0	`YES`	Total document chunks processed
`total_pairs`	`integer`	0	`YES`	Total doc-language pairs in this job
`succeeded_pairs`	`integer`	0	`YES`	Number of pairs that completed successfully
`failed_pairs`	`integer`	0	`YES`	Number of pairs that failed
`elapsed_seconds`	`integer`	—	`YES`	Total job duration in seconds
`step_function_arn`	`text`	—	`YES`	ARN of the Step Function execution
`retry_of`	`varchar(64)`	—	`YES`	Job ID this is a retry of (nullable)
`idempotency_key`	`varchar(128)`	—	`YES`	Client-provided idempotency key
`cloudfront_invalidation_ids`	`jsonb`	[]	`YES`	Array of CloudFront invalidation IDs (JSONB)
`search_index_rebuilt`	`boolean`	false	`YES`	Whether the search index was rebuilt after this job
`notification_sent`	`boolean`	false	`YES`	Whether completion notification was sent
`created_at`	`timestamptz`	now()	NO	Record creation timestamp
`updated_at`	`timestamptz`	now()	NO	Last state change timestamp
`reason`	`text`	—	`YES`	Cancellation or failure reason
`model`	`varchar`	claude-sonnet-4.6	`YES`	Bedrock model used for translation (e.g., claude-sonnet-4.6, claude-haiku-3.5, claude-opus-4)

Indexes

idx_translation_jobs_state — on state
idx_translation_jobs_submitted_at — on submitted_at DESC
translation_jobs_job_id_key — UNIQUE on job_id
translation_jobs_pkey — UNIQUE on id

2. translation_job_events

Purpose: Translation job audit log. Granular event stream for each job — per-pair start, success, failure, retry, and completion events.

7 columns

Column	Data Type	Default	Nullable	Description
`id` PK	`bigint`	auto-increment	NO	Auto-incrementing event identifier
`job_id`	`varchar(64)`	—	NO	Parent job identifier (matches translation_jobs.job_id)
`event_type`	`varchar(30)`	—	NO	Event type: pair_start, pair_success, pair_failed, job_start, job_complete, etc.
`doc`	`varchar(200)`	—	`YES`	Document filename (for pair-level events)
`lang`	`varchar(5)`	—	`YES`	Target language code (for pair-level events)
`detail`	`jsonb`	{}	`YES`	Event details as JSON (error messages, timing, chunk counts)
`created_at`	`timestamptz`	now()	NO	Event timestamp

Indexes

idx_translation_job_events_created_at — on created_at DESC
idx_translation_job_events_job_id — on job_id
translation_job_events_pkey — UNIQUE on id

3. translation_quotes

Purpose: Customer-facing translation quotes with Stripe payment integration. When a customer requests a document translation via the public API, a quote is generated with cost breakdown (per-chunk, per-pair, infrastructure, markup). The customer accepts and pays via Stripe; on payment confirmation the quote transitions to accepted and triggers the translation job.

32 columns

Column	Data Type	Default	Nullable	Description
`id` PK	`bigint`	auto-increment	NO	Auto-incrementing identifier
`quote_id`	`varchar(64)`	—	NO	Public-facing quote identifier (ULID)
`api_key_id`	`varchar(128)`	—	NO	API key that requested the quote
`doc_url`	`text`	—	NO	URL of the document to translate
`doc_name`	`varchar(255)`	—	NO	Document filename
`doc_size_bytes`	`integer`	—	NO	Document size in bytes
`estimated_chunks`	`integer`	—	NO	Estimated number of translation chunks
`difficulty`	`varchar(21)`	standard	NO	Document difficulty: standard, technical, complex
`code_blocks`	`integer`	0	NO	Number of code blocks detected in the document
`diagrams`	`integer`	0	NO	Number of Mermaid diagrams detected
`langs`	`jsonb`	—	NO	Target languages as JSON array
`lang_count`	`integer`	—	NO	Number of target languages
`cost_per_chunk`	`numeric(8,4)`	—	NO	Bedrock cost per chunk (USD)
`cost_per_pair`	`numeric(8,4)`	—	NO	Total cost per doc-language pair (USD)
`infra_per_pair`	`numeric(8,4)`	—	NO	Infrastructure cost per pair (S3, CloudFront, compute)
`subtotal`	`numeric(10,4)`	—	NO	Subtotal before markup (USD)
`markup_pct`	`integer`	—	NO	Markup percentage applied
`total_price`	`numeric(10,4)`	—	NO	Final price charged to customer (USD)
`model`	`varchar(50)`	claude-sonnet-4.6	`YES`	AI model used for translation
`state`	`varchar(21)`	pending	NO	Quote lifecycle: pending, accepted, declined, expired, refunded
`expires_at`	`timestamptz`	—	NO	Quote expiration (24 hours from creation)
`accepted_at`	`timestamptz`	—	`YES`	When the customer accepted and paid
`declined_at`	`timestamptz`	—	`YES`	When the customer declined
`stripe_payment_intent_id`	`varchar(128)`	—	`YES`	Stripe PaymentIntent ID for this quote
`stripe_charge_id`	`varchar(128)`	—	`YES`	Stripe Charge ID after successful payment
`job_id`	`varchar(64)`	—	`YES`	Translation job ID created after acceptance
`refunded_at`	`timestamptz`	—	`YES`	When a refund was issued
`refund_amount`	`numeric(10,2)`	NULL	`YES`	Refund amount (USD)
`refund_id`	`varchar(255)`	NULL	`YES`	Stripe Refund ID
`protected_terms`	`jsonb`	[]	`YES`	Customer-supplied brand terms to protect during translation
`created_at`	`timestamptz`	now()	NO	Quote creation timestamp
`updated_at`	`timestamptz`	now()	NO	Last modification timestamp

Indexes

translation_quotes_pkey — UNIQUE on id
translation_quotes_quote_id_key — UNIQUE on quote_id
idx_quotes_api_key — on api_key_id
idx_quotes_expires — on expires_at
idx_quotes_state — on state

4. translation_parameters

Purpose: Dedicated parameter store for the translation engine — separated from the general application_parameters table because translation is a standalone product with its own pricing, cost controls, and operational config. Cached in Valkey as tx:params hash (5-minute TTL, rebuilt nightly by the sync job). Every parameter that governs translation behavior — per-token Bedrock rates, cost caps, markup percentages, batch config, and token estimation factors — lives here. Changes take effect within 5 minutes without a code deploy.

6 columns · 69 rows

Column	Data Type	Default	Nullable	Description
`key` PK	`varchar`	—	NO	Parameter key in dot-notation (e.g., `bedrock_pricing.claude-sonnet-4.6.realtime.input`)
`value`	`text`	—	NO	Parameter value (numeric stored as text for flexibility)
`description`	`text`	''	NO	Human-readable explanation of what this parameter controls
`category`	`varchar`	general	NO	Grouping category: `bedrock_pricing`, `token_estimation`, `cost_controls`, `customer_pricing`, `batch_config`
`updated_at`	`timestamptz`	now()	NO	Last modification timestamp
`updated_by`	`varchar`	system	NO	Who last changed this parameter (admin email, sync-job, kiro)

Category	Rows	What It Controls
`bedrock_pricing`	~30	Per-token rates for each model × mode (realtime input/output, batch input/output, cache read/write). Drives all cost calculations.
`token_estimation`	~12	Tokens-per-byte ratio, output/input multiplier, cache hit rate, speed factors per model. Used to estimate job cost before submission.
`cost_controls`	~9	Daily spend cap ($600), max token limit, max active jobs (3), max queued jobs (12), max docs per request (6).
`customer_pricing`	~9	Markup percentage (30%), infra cost per pair, tier discount factors, quote validity period (24h).
`batch_config`	~6	S3 bucket for batch inference, IAM role ARN, polling interval (33s), JSONL processing config.

5. bedrock_model_catalog

Purpose: Registry of all Bedrock models available to the translation engine. The agent-agnostic architecture reads this catalog to resolve friendly model names to Bedrock model IDs, determine regional availability, select inference types (realtime vs batch), and enforce blocked regions. The lookupModelID() function consults this table — if a model isn't here, it can't be used for translation. Enables adding new models (or blocking broken ones) without code changes.

12 columns · 20 rows

Column	Data Type	Default	Nullable	Description
`id` PK	`integer`	auto-increment	NO	Auto-incrementing identifier
`friendly_name`	`varchar`	—	NO	Human-friendly model name used in API requests (e.g., `claude-sonnet-4.6`, `qwen3-235b`)
`display_name`	`varchar`	—	NO	Marketing display name (e.g., "Claude Sonnet 4.6", "Qwen3 235B")
`bedrock_model_id`	`varchar`	—	NO	Full Bedrock model ARN/ID (e.g., `us.anthropic.claude-sonnet-4-6-20250514-v1:0`)
`region`	`varchar`	—	NO	AWS region where this model is invoked. HARD RULE: `ca-central-1` and `ca-west-1` are NEVER used.
`inference_type`	`varchar`	—	NO	Invocation method: `realtime` (InvokeModel) or `batch` (CreateModelInvocationJob)
`priority`	`integer`	1	NO	Preference order when multiple entries exist for the same `friendly_name` (lower = preferred). Used for regional failover.
`enabled`	`boolean`	true	NO	Whether this catalog entry is active. Disabled entries are skipped during model resolution.
`notes`	`text`	''	NO	Admin notes (why it's blocked, performance observations, etc.)
`verified_at`	`timestamptz`	now()	NO	When this model was last verified to be responding correctly in its region
`created_at`	`timestamptz`	now()	NO	When this catalog entry was created
`blocked`	`boolean`	false	NO	Hard block — entries with `blocked=true` are documentation of regions/models that must NEVER be used (CA regions). Unlike `enabled=false` (temporarily off), blocked entries represent permanent policy constraints.

Format Families

The translation engine routes models through one of three format families based on their prefix. Each family has its own request/response JSON structure — the engine handles this transparently via DetectFamily(), BuildRequestBody(), and ParseResponseText() in internal/bedrock/formats.go:

Family	Prefix Match	Models
`anthropic`	`anthropic.`, `us.anthropic.`	Claude Haiku 3.5, Claude Sonnet 4.6, Claude Opus 4
`openai`	`qwen.`, `deepseek.`, `mistral.*`	Qwen3 235B, DeepSeek V3, Mistral Large
`nova`	`amazon.nova.*`	(Reserved — Nova Pro removed from active roster due to HTML tag dropping)

Blocked regions (permanent policy): ca-central-1 and ca-west-1 have blocked=true entries in this catalog. The lookupModelID() function enforces this at runtime — any CA region is redirected to us-east-2. This is a hard architectural constraint, not a temporary disable.

6. translation_tasks

Purpose: Task Organizer — parent-child orchestration for translation jobs. A Task is the unit above a Job: 1 customer acceptance = 1 Task = N Jobs. One notification per Task (not per Job) when all Jobs reach terminal state. Tracks task lifecycle, customer email for notification routing, and quote linkage for the payment flow.

15 columns

Column	Data Type	Default	Nullable	Description
`id` PK	`bigint`	auto-increment	NO	Auto-incrementing identifier
`task_id`	`varchar(64)`	—	NO	Public task identifier (ULID, unique)
`task_type`	`varchar(20)`	admin_batch	NO	Task type: `acceptance` (customer-submitted via payment) or `admin_batch` (internal)
`state`	`varchar(20)`	running	NO	Task state: running, succeeded, partial, failed, cancelled
`customer_email`	`text`	—	YES	Customer email for task-level completion notification (acceptance tasks only)
`quote_ids`	`jsonb`	[]	YES	Array of translation_quotes.quote_id bundled into this task
`job_ids`	`jsonb`	[]	NO	Array of child translation_jobs.job_id linked to this task
`total_jobs`	`integer`	0	NO	Total number of child jobs submitted
`completed_jobs`	`integer`	0	NO	Counter: jobs that succeeded
`failed_jobs`	`integer`	0	NO	Counter: jobs that failed
`cancelled_jobs`	`integer`	0	NO	Counter: jobs that were cancelled
`notification_sent`	`boolean`	false	NO	Idempotent flag — prevents double-send of completion email
`submitted_at`	`timestamptz`	now()	NO	When the task was created
`completed_at`	`timestamptz`	—	YES	When all child jobs reached terminal state
`created_at`	`timestamptz`	now()	NO	Record creation timestamp

Indexes

translation_tasks_pkey — UNIQUE on id
translation_tasks_task_id_key — UNIQUE on task_id
idx_translation_tasks_state — on state WHERE state = 'running'

Aurora Functions

create_translation_task(task_id, task_type, customer_email, quote_ids, job_ids, total_jobs) — Creates the task record
get_task_status(task_id) — Returns task + all jobs in one call (JOIN)
get_active_tasks() — Returns all non-terminal tasks (poll loop query)
record_task_job_completed(task_id, job_id, job_state) — Atomic counter update + "all done?" boolean
mark_task_notified(task_id) — Idempotent notification flag set
cancel_task_remaining(task_id, reason) — Cancel all pending jobs, close task

Relationship: translation_jobs.task_id is a nullable FK column (with partial index on non-NULL values) linking child jobs to their parent task. Jobs created outside the Task Organizer (direct POST /admin/translate) have task_id = NULL.

7. language_profiles

Purpose: Per-language translation configuration. Each row defines the operational knobs for a target language — chunk sizing, tag limits, retry behavior, model overrides, and density thresholds. The translation worker loads its profile at container startup via get_language_profile(lang). The density columns are automatically tuned by the worker's auto-tuning system based on chunk telemetry (tag drop rates).

16 columns

Column	Data Type	Default	Nullable	Description
`lang_code` PK	`varchar(5)`	—	NO	ISO 639-1 language code (e.g., ja, ar, hi)
`display_name`	`text`	—	NO	Human-readable language name (e.g., Japanese, Arabic)
`script_family`	`varchar(20)`	latin	NO	Script family: latin, cjk, indic, arabic, cyrillic. Drives preprocessor density thresholds and chunk sizing.
`max_chunk_kb`	`integer`	15	NO	Maximum chunk size in KB for this language (lower for complex scripts)
`max_code_tags`	`integer`	81	NO	Maximum code tags per chunk before forced split (multiple of 3)
`token_expansion_factor`	`numeric(4,2)`	1.00	NO	Expected output/input token ratio (e.g., 1.30 for Japanese — more tokens needed per concept)
`max_retries`	`integer`	1	NO	Maximum retry attempts per chunk on validation failure
`retry_chunk_shrink_pct`	`integer`	30	NO	Percentage to shrink chunk on retry (e.g., 30 = reduce to 70% of original)
`model_override`	`text`	—	YES	Override the default model for this language (e.g., route Urdu to a specific model)
`temperature`	`numeric(3,2)`	0.30	NO	Bedrock temperature for this language (lower = more deterministic)
`max_output_tokens`	`integer`	4096	NO	Maximum output tokens per Bedrock call (prevents truncation for high-expansion languages)
`priority`	`integer`	6	NO	Processing priority (lower = higher priority). Used for queue ordering.
`enabled`	`boolean`	true	NO	Whether this language is available for translation
`notes`	`text`	—	YES	Operational notes (e.g., "CJK models struggle with nested code blocks")
`density_lift_threshold`	`numeric(5,4)`	0.0600	YES	Auto-tuned. Density ratio threshold for block lifting. When protected_elements/prose_chars exceeds this value, the container is lifted whole. Auto-tightened by telemetry analysis when tag drop rate exceeds 3%. Floor: 0.015.
`density_max_prose`	`integer`	300	YES	Auto-tuned. Maximum prose characters for a container to be eligible for density lifting. Containers with more prose than this are left in place regardless of density ratio.

Indexes

language_profiles_pkey — UNIQUE on lang_code

Auto-Tuning: The density columns (density_lift_threshold, density_max_prose) are automatically adjusted by the translation worker's _auto_tune_density_threshold() function. It analyzes the last 7 days of chunk_telemetry data (code_tags_in vs code_tags_out) and tightens the threshold for languages that consistently drop tags. Adjustments: 3-6% drop rate → -0.005, 6-12% → -0.01, >12% → -0.015. Floor value: 0.015 (never lower). Runs once per container lifetime per language.

Aurora Function: get_language_profile(lang_code) — Returns the full row for the specified language. Called by the worker at container startup. Falls back to hardcoded defaults if Aurora is unreachable.

Entity Relationships

erDiagram
    translation_tasks ||--o{ translation_jobs : "1 task = N jobs"
    translation_quotes ||--o| translation_jobs : "acceptance triggers"
    translation_jobs ||--o{ translation_job_events : "audit trail"
    bedrock_model_catalog ||--o{ translation_jobs : "model selection"
    translation_parameters ||--|| translation_jobs : "cost calculation"
    language_profiles ||--o{ translation_jobs : "per-lang config"

Key Relationships

translation_jobs.task_id → translation_tasks.task_id (nullable FK, partial index on non-NULL)
translation_quotes.job_id → translation_jobs.job_id (set on payment acceptance)
translation_job_events.job_id → translation_jobs.job_id (audit log linkage)

Valkey Cache Keys

Purpose: Translation state lives in both Aurora (permanent ledger) and Valkey (fast reads). The status endpoint reads Valkey first for live jobs, falls back to Aurora for completed jobs that have aged out.

Key Pattern	Type	TTL	Content
`tx:job:{id}`	Hash	7 days	Live job state (status, progress, per-pair results)
`tx:active`	Set	Managed	Currently running job IDs
`tx:history`	Sorted Set	30 days	Completed job IDs (score=timestamp)
`tx:idempotency:{key}`	String	24h	Idempotency deduplication
`tx:params`	Hash	5 min	translation_parameters cache (all 69 rows)
`tx:chunk-hash:*`	String	Permanent	Per-chunk content hashes (delta detection)
`autoops:bedrock:spend:daily`	String	24h	Today's Bedrock translation spend ($)
`autoops:bedrock:tokens:input:daily`	String	24h	Today's input token count
`autoops:bedrock:tokens:output:daily`	String	24h	Today's output token count

Admin API Endpoints

Purpose: Admin endpoints for managing translation data. All require X-Admin-Key header.

Endpoint	Method	Purpose
`/admin/translate`	POST	Submit a translation job (docs + langs)
`/admin/translate/status/{job_id}`	GET	Real-time per-pair progress
`/admin/translate/queue`	GET	Active queue with all running/queued jobs
`/admin/translate/health`	GET	System health (spend, active jobs, queue depth)
`/admin/translate/history`	GET	Job history (last 50)
`/admin/translate/cancel/{job_id}`	POST	Cancel a running job
`/admin/translate/retry-failed/{job_id}`	POST	Retry only failed pairs from a partial job
`/admin/translate/params`	GET/POST	List all / upsert translation parameters
`/admin/translate/params/reload`	POST	Force Valkey cache refresh from Aurora
`/admin/translate/models`	GET	List available models from bedrock_model_catalog

Dashboard equivalents: GET/POST/DELETE /dashboard/api/translate/params, POST /dashboard/api/translate/params/reload — same functionality, different auth path (magic link session vs admin key).

Schema Summary

7 tables, 118 columns total, plus 6 Aurora stored functions supporting the Task Organizer.

Table	Columns	Rows (approx)	Purpose
`translation_jobs`	30	~200	Job orchestration and state
`translation_job_events`	7	~5,000	Granular audit trail
`translation_quotes`	32	~50	Customer quotes + Stripe payment
`translation_parameters`	6	69	Operational configuration store
`bedrock_model_catalog`	12	20	Multi-provider model registry
`translation_tasks`	15	~30	Parent-child task orchestration
`language_profiles`	16	11	Per-language config + auto-tuning

Related documents:

TBI Translation Engine — Architecture, sentinel system, validators, worker pattern, cost protection
Translation Service — Customer-facing product: pricing tiers, ordering, delivery modes
Translation Parameters — Full parameter reference (all 69 rows with descriptions)
Aurora Data Dictionary — Complete schema for all other 37 tables

The Trinity Beast — TBTS Data Dictionary

Table of Contents

1. translation_jobs

Indexes

2. translation_job_events

Indexes

3. translation_quotes

Indexes

4. translation_parameters

Categories

5. bedrock_model_catalog

Format Families

6. translation_tasks

Indexes

Aurora Functions

7. language_profiles

Indexes

Entity Relationships

Key Relationships

Valkey Cache Keys

Admin API Endpoints

Schema Summary