The Trinity BeastTBTS Data Dictionary

Complete schema reference for the Trinity Beast Translation Service — 7 tables, 118 columns, dedicated parameter store, model catalog, and task orchestration.

Database: CPMP_Backend_Aurora Engine: Aurora PostgreSQL Serverless v2 Updated: June 15, 2026 Version: v1

Table of Contents

  1. Translation Engine Tables
    1. 1. translation_jobs — Translation job orchestration (30 columns)
    2. 2. translation_job_events — Translation job audit log (7 columns)
    3. 3. translation_quotes — Customer-facing quotes with Stripe payment (32 columns)
    4. 4. translation_parameters — Cost, pricing, and operational parameters (6 columns, 69 rows)
    5. 5. bedrock_model_catalog — Multi-provider Bedrock model registry (12 columns, 20 rows)
    6. 6. translation_tasks — Task Organizer parent-child orchestration (15 columns)
    7. 7. language_profiles — Per-language translation configuration with auto-tuning (16 columns)
  2. Reference
    1. Entity Relationships
    2. Valkey Cache Keys
    3. Admin API Endpoints

1. translation_jobs

Purpose: Translation job orchestration. Tracks every translation batch submitted via the admin API — documents, languages, progress, costs, and Step Function state.

30 columns

ColumnData TypeDefaultNullableDescription
id PKuuidgen_random_uuid()NOInternal UUID (auto-generated)
job_idvarchar(64)NOPublic job identifier (unique, used in API responses)
task_id FK → translation_tasks.task_idvarchar(64)YESParent task identifier (NULL for standalone jobs submitted via POST /admin/translate). Partial index on non-NULL values.
statevarchar(20)queuedNOJob state: queued, running, completed, failed, cancelled
submitted_attimestamptznow()NOWhen the job was submitted
started_attimestamptzYESWhen processing began
completed_attimestamptzYESWhen processing finished
submitted_byvarchar(50)admin-apiYESWho submitted: admin-api, retry, cli
docsjsonbNOArray of document filenames to translate (JSONB)
langsjsonbNOArray of target language codes (JSONB)
optionsjsonb{}YESJob options as JSON (chunk size overrides, etc.)
progressjsonb{}YESPer-pair progress tracking as JSON
errorsjsonb[]YESArray of error details for failed pairs (JSONB)
bedrock_cost_usdnumeric0YESTotal Bedrock API cost for this job in USD
bedrock_invocationsinteger0YESTotal Bedrock API calls made
total_chunksinteger0YESTotal document chunks processed
total_pairsinteger0YESTotal doc-language pairs in this job
succeeded_pairsinteger0YESNumber of pairs that completed successfully
failed_pairsinteger0YESNumber of pairs that failed
elapsed_secondsintegerYESTotal job duration in seconds
step_function_arntextYESARN of the Step Function execution
retry_ofvarchar(64)YESJob ID this is a retry of (nullable)
idempotency_keyvarchar(128)YESClient-provided idempotency key
cloudfront_invalidation_idsjsonb[]YESArray of CloudFront invalidation IDs (JSONB)
search_index_rebuiltbooleanfalseYESWhether the search index was rebuilt after this job
notification_sentbooleanfalseYESWhether completion notification was sent
created_attimestamptznow()NORecord creation timestamp
updated_attimestamptznow()NOLast state change timestamp
reasontextYESCancellation or failure reason
modelvarcharclaude-sonnet-4.6YESBedrock model used for translation (e.g., claude-sonnet-4.6, claude-haiku-3.5, claude-opus-4)

Indexes

  • idx_translation_jobs_state — on state
  • idx_translation_jobs_submitted_at — on submitted_at DESC
  • translation_jobs_job_id_key — UNIQUE on job_id
  • translation_jobs_pkey — UNIQUE on id

2. translation_job_events

Purpose: Translation job audit log. Granular event stream for each job — per-pair start, success, failure, retry, and completion events.

7 columns

ColumnData TypeDefaultNullableDescription
id PKbigintauto-incrementNOAuto-incrementing event identifier
job_idvarchar(64)NOParent job identifier (matches translation_jobs.job_id)
event_typevarchar(30)NOEvent type: pair_start, pair_success, pair_failed, job_start, job_complete, etc.
docvarchar(200)YESDocument filename (for pair-level events)
langvarchar(5)YESTarget language code (for pair-level events)
detailjsonb{}YESEvent details as JSON (error messages, timing, chunk counts)
created_attimestamptznow()NOEvent timestamp

Indexes

  • idx_translation_job_events_created_at — on created_at DESC
  • idx_translation_job_events_job_id — on job_id
  • translation_job_events_pkey — UNIQUE on id

3. translation_quotes

Purpose: Customer-facing translation quotes with Stripe payment integration. When a customer requests a document translation via the public API, a quote is generated with cost breakdown (per-chunk, per-pair, infrastructure, markup). The customer accepts and pays via Stripe; on payment confirmation the quote transitions to accepted and triggers the translation job.

32 columns

ColumnData TypeDefaultNullableDescription
id PKbigintauto-incrementNOAuto-incrementing identifier
quote_idvarchar(64)NOPublic-facing quote identifier (ULID)
api_key_idvarchar(128)NOAPI key that requested the quote
doc_urltextNOURL of the document to translate
doc_namevarchar(255)NODocument filename
doc_size_bytesintegerNODocument size in bytes
estimated_chunksintegerNOEstimated number of translation chunks
difficultyvarchar(21)standardNODocument difficulty: standard, technical, complex
code_blocksinteger0NONumber of code blocks detected in the document
diagramsinteger0NONumber of Mermaid diagrams detected
langsjsonbNOTarget languages as JSON array
lang_countintegerNONumber of target languages
cost_per_chunknumeric(8,4)NOBedrock cost per chunk (USD)
cost_per_pairnumeric(8,4)NOTotal cost per doc-language pair (USD)
infra_per_pairnumeric(8,4)NOInfrastructure cost per pair (S3, CloudFront, compute)
subtotalnumeric(10,4)NOSubtotal before markup (USD)
markup_pctintegerNOMarkup percentage applied
total_pricenumeric(10,4)NOFinal price charged to customer (USD)
modelvarchar(50)claude-sonnet-4.6YESAI model used for translation
statevarchar(21)pendingNOQuote lifecycle: pending, accepted, declined, expired, refunded
expires_attimestamptzNOQuote expiration (24 hours from creation)
accepted_attimestamptzYESWhen the customer accepted and paid
declined_attimestamptzYESWhen the customer declined
stripe_payment_intent_idvarchar(128)YESStripe PaymentIntent ID for this quote
stripe_charge_idvarchar(128)YESStripe Charge ID after successful payment
job_idvarchar(64)YESTranslation job ID created after acceptance
refunded_attimestamptzYESWhen a refund was issued
refund_amountnumeric(10,2)NULLYESRefund amount (USD)
refund_idvarchar(255)NULLYESStripe Refund ID
protected_termsjsonb[]YESCustomer-supplied brand terms to protect during translation
created_attimestamptznow()NOQuote creation timestamp
updated_attimestamptznow()NOLast modification timestamp

Indexes

  • translation_quotes_pkey — UNIQUE on id
  • translation_quotes_quote_id_key — UNIQUE on quote_id
  • idx_quotes_api_key — on api_key_id
  • idx_quotes_expires — on expires_at
  • idx_quotes_state — on state

4. translation_parameters

Purpose: Dedicated parameter store for the translation engine — separated from the general application_parameters table because translation is a standalone product with its own pricing, cost controls, and operational config. Cached in Valkey as tx:params hash (5-minute TTL, rebuilt nightly by the sync job). Every parameter that governs translation behavior — per-token Bedrock rates, cost caps, markup percentages, batch config, and token estimation factors — lives here. Changes take effect within 5 minutes without a code deploy.

6 columns · 69 rows

ColumnData TypeDefaultNullableDescription
key PKvarcharNOParameter key in dot-notation (e.g., bedrock_pricing.claude-sonnet-4.6.realtime.input)
valuetextNOParameter value (numeric stored as text for flexibility)
descriptiontext''NOHuman-readable explanation of what this parameter controls
categoryvarchargeneralNOGrouping category: bedrock_pricing, token_estimation, cost_controls, customer_pricing, batch_config
updated_attimestamptznow()NOLast modification timestamp
updated_byvarcharsystemNOWho last changed this parameter (admin email, sync-job, kiro)

Categories

CategoryRowsWhat It Controls
bedrock_pricing~30Per-token rates for each model × mode (realtime input/output, batch input/output, cache read/write). Drives all cost calculations.
token_estimation~12Tokens-per-byte ratio, output/input multiplier, cache hit rate, speed factors per model. Used to estimate job cost before submission.
cost_controls~9Daily spend cap ($600), max token limit, max active jobs (3), max queued jobs (12), max docs per request (6).
customer_pricing~9Markup percentage (30%), infra cost per pair, tier discount factors, quote validity period (24h).
batch_config~6S3 bucket for batch inference, IAM role ARN, polling interval (33s), JSONL processing config.
Admin access: GET /admin/translate/params (list all), POST /admin/translate/params (upsert), POST /admin/translate/params/reload (force cache refresh). Dashboard: GET/POST/DELETE /dashboard/api/translate/params.

5. bedrock_model_catalog

Purpose: Registry of all Bedrock models available to the translation engine. The agent-agnostic architecture reads this catalog to resolve friendly model names to Bedrock model IDs, determine regional availability, select inference types (realtime vs batch), and enforce blocked regions. The lookupModelID() function consults this table — if a model isn't here, it can't be used for translation. Enables adding new models (or blocking broken ones) without code changes.

12 columns · 20 rows

ColumnData TypeDefaultNullableDescription
id PKintegerauto-incrementNOAuto-incrementing identifier
friendly_namevarcharNOHuman-friendly model name used in API requests (e.g., claude-sonnet-4.6, qwen3-235b)
display_namevarcharNOMarketing display name (e.g., "Claude Sonnet 4.6", "Qwen3 235B")
bedrock_model_idvarcharNOFull Bedrock model ARN/ID (e.g., us.anthropic.claude-sonnet-4-6-20250514-v1:0)
regionvarcharNOAWS region where this model is invoked. HARD RULE: ca-central-1 and ca-west-1 are NEVER used.
inference_typevarcharNOInvocation method: realtime (InvokeModel) or batch (CreateModelInvocationJob)
priorityinteger1NOPreference order when multiple entries exist for the same friendly_name (lower = preferred). Used for regional failover.
enabledbooleantrueNOWhether this catalog entry is active. Disabled entries are skipped during model resolution.
notestext''NOAdmin notes (why it's blocked, performance observations, etc.)
verified_attimestamptznow()NOWhen this model was last verified to be responding correctly in its region
created_attimestamptznow()NOWhen this catalog entry was created
blockedbooleanfalseNOHard block — entries with blocked=true are documentation of regions/models that must NEVER be used (CA regions). Unlike enabled=false (temporarily off), blocked entries represent permanent policy constraints.

Format Families

The translation engine routes models through one of three format families based on their prefix. Each family has its own request/response JSON structure — the engine handles this transparently via DetectFamily(), BuildRequestBody(), and ParseResponseText() in internal/bedrock/formats.go:

FamilyPrefix MatchModels
anthropicanthropic.*, us.anthropic.*Claude Haiku 3.5, Claude Sonnet 4.6, Claude Opus 4
openaiqwen.*, deepseek.*, mistral.*Qwen3 235B, DeepSeek V3, Mistral Large
novaamazon.nova.*(Reserved — Nova Pro removed from active roster due to HTML tag dropping)
Blocked regions (permanent policy): ca-central-1 and ca-west-1 have blocked=true entries in this catalog. The lookupModelID() function enforces this at runtime — any CA region is redirected to us-east-2. This is a hard architectural constraint, not a temporary disable.

6. translation_tasks

Purpose: Task Organizer — parent-child orchestration for translation jobs. A Task is the unit above a Job: 1 customer acceptance = 1 Task = N Jobs. One notification per Task (not per Job) when all Jobs reach terminal state. Tracks task lifecycle, customer email for notification routing, and quote linkage for the payment flow.

15 columns

ColumnData TypeDefaultNullableDescription
id PKbigintauto-incrementNOAuto-incrementing identifier
task_idvarchar(64)NOPublic task identifier (ULID, unique)
task_typevarchar(20)admin_batchNOTask type: acceptance (customer-submitted via payment) or admin_batch (internal)
statevarchar(20)runningNOTask state: running, succeeded, partial, failed, cancelled
customer_emailtextYESCustomer email for task-level completion notification (acceptance tasks only)
quote_idsjsonb[]YESArray of translation_quotes.quote_id bundled into this task
job_idsjsonb[]NOArray of child translation_jobs.job_id linked to this task
total_jobsinteger0NOTotal number of child jobs submitted
completed_jobsinteger0NOCounter: jobs that succeeded
failed_jobsinteger0NOCounter: jobs that failed
cancelled_jobsinteger0NOCounter: jobs that were cancelled
notification_sentbooleanfalseNOIdempotent flag — prevents double-send of completion email
submitted_attimestamptznow()NOWhen the task was created
completed_attimestamptzYESWhen all child jobs reached terminal state
created_attimestamptznow()NORecord creation timestamp

Indexes

  • translation_tasks_pkey — UNIQUE on id
  • translation_tasks_task_id_key — UNIQUE on task_id
  • idx_translation_tasks_state — on state WHERE state = 'running'

Aurora Functions

  • create_translation_task(task_id, task_type, customer_email, quote_ids, job_ids, total_jobs) — Creates the task record
  • get_task_status(task_id) — Returns task + all jobs in one call (JOIN)
  • get_active_tasks() — Returns all non-terminal tasks (poll loop query)
  • record_task_job_completed(task_id, job_id, job_state) — Atomic counter update + "all done?" boolean
  • mark_task_notified(task_id) — Idempotent notification flag set
  • cancel_task_remaining(task_id, reason) — Cancel all pending jobs, close task
Relationship: translation_jobs.task_id is a nullable FK column (with partial index on non-NULL values) linking child jobs to their parent task. Jobs created outside the Task Organizer (direct POST /admin/translate) have task_id = NULL.

7. language_profiles

Purpose: Per-language translation configuration. Each row defines the operational knobs for a target language — chunk sizing, tag limits, retry behavior, model overrides, and density thresholds. The translation worker loads its profile at container startup via get_language_profile(lang). The density columns are automatically tuned by the worker's auto-tuning system based on chunk telemetry (tag drop rates).

16 columns

ColumnData TypeDefaultNullableDescription
lang_code PKvarchar(5)NOISO 639-1 language code (e.g., ja, ar, hi)
display_nametextNOHuman-readable language name (e.g., Japanese, Arabic)
script_familyvarchar(20)latinNOScript family: latin, cjk, indic, arabic, cyrillic. Drives preprocessor density thresholds and chunk sizing.
max_chunk_kbinteger15NOMaximum chunk size in KB for this language (lower for complex scripts)
max_code_tagsinteger81NOMaximum code tags per chunk before forced split (multiple of 3)
token_expansion_factornumeric(4,2)1.00NOExpected output/input token ratio (e.g., 1.30 for Japanese — more tokens needed per concept)
max_retriesinteger1NOMaximum retry attempts per chunk on validation failure
retry_chunk_shrink_pctinteger30NOPercentage to shrink chunk on retry (e.g., 30 = reduce to 70% of original)
model_overridetextYESOverride the default model for this language (e.g., route Urdu to a specific model)
temperaturenumeric(3,2)0.30NOBedrock temperature for this language (lower = more deterministic)
max_output_tokensinteger4096NOMaximum output tokens per Bedrock call (prevents truncation for high-expansion languages)
priorityinteger6NOProcessing priority (lower = higher priority). Used for queue ordering.
enabledbooleantrueNOWhether this language is available for translation
notestextYESOperational notes (e.g., "CJK models struggle with nested code blocks")
density_lift_thresholdnumeric(5,4)0.0600YESAuto-tuned. Density ratio threshold for block lifting. When protected_elements/prose_chars exceeds this value, the container is lifted whole. Auto-tightened by telemetry analysis when tag drop rate exceeds 3%. Floor: 0.015.
density_max_proseinteger300YESAuto-tuned. Maximum prose characters for a container to be eligible for density lifting. Containers with more prose than this are left in place regardless of density ratio.

Indexes

  • language_profiles_pkey — UNIQUE on lang_code
Auto-Tuning: The density columns (density_lift_threshold, density_max_prose) are automatically adjusted by the translation worker's _auto_tune_density_threshold() function. It analyzes the last 7 days of chunk_telemetry data (code_tags_in vs code_tags_out) and tightens the threshold for languages that consistently drop tags. Adjustments: 3-6% drop rate → -0.005, 6-12% → -0.01, >12% → -0.015. Floor value: 0.015 (never lower). Runs once per container lifetime per language.
Aurora Function: get_language_profile(lang_code) — Returns the full row for the specified language. Called by the worker at container startup. Falls back to hardcoded defaults if Aurora is unreachable.

Entity Relationships

erDiagram
    translation_tasks ||--o{ translation_jobs : "1 task = N jobs"
    translation_quotes ||--o| translation_jobs : "acceptance triggers"
    translation_jobs ||--o{ translation_job_events : "audit trail"
    bedrock_model_catalog ||--o{ translation_jobs : "model selection"
    translation_parameters ||--|| translation_jobs : "cost calculation"
    language_profiles ||--o{ translation_jobs : "per-lang config"
        

Key Relationships

  • translation_jobs.task_idtranslation_tasks.task_id (nullable FK, partial index on non-NULL)
  • translation_quotes.job_idtranslation_jobs.job_id (set on payment acceptance)
  • translation_job_events.job_idtranslation_jobs.job_id (audit log linkage)

Valkey Cache Keys

Purpose: Translation state lives in both Aurora (permanent ledger) and Valkey (fast reads). The status endpoint reads Valkey first for live jobs, falls back to Aurora for completed jobs that have aged out.
Key PatternTypeTTLContent
tx:job:{id}Hash7 daysLive job state (status, progress, per-pair results)
tx:activeSetManagedCurrently running job IDs
tx:historySorted Set30 daysCompleted job IDs (score=timestamp)
tx:idempotency:{key}String24hIdempotency deduplication
tx:paramsHash5 mintranslation_parameters cache (all 69 rows)
tx:chunk-hash:*StringPermanentPer-chunk content hashes (delta detection)
autoops:bedrock:spend:dailyString24hToday's Bedrock translation spend ($)
autoops:bedrock:tokens:input:dailyString24hToday's input token count
autoops:bedrock:tokens:output:dailyString24hToday's output token count

Admin API Endpoints

Purpose: Admin endpoints for managing translation data. All require X-Admin-Key header.
EndpointMethodPurpose
/admin/translatePOSTSubmit a translation job (docs + langs)
/admin/translate/status/{job_id}GETReal-time per-pair progress
/admin/translate/queueGETActive queue with all running/queued jobs
/admin/translate/healthGETSystem health (spend, active jobs, queue depth)
/admin/translate/historyGETJob history (last 50)
/admin/translate/cancel/{job_id}POSTCancel a running job
/admin/translate/retry-failed/{job_id}POSTRetry only failed pairs from a partial job
/admin/translate/paramsGET/POSTList all / upsert translation parameters
/admin/translate/params/reloadPOSTForce Valkey cache refresh from Aurora
/admin/translate/modelsGETList available models from bedrock_model_catalog
Dashboard equivalents: GET/POST/DELETE /dashboard/api/translate/params, POST /dashboard/api/translate/params/reload — same functionality, different auth path (magic link session vs admin key).

Schema Summary

7 tables, 118 columns total, plus 6 Aurora stored functions supporting the Task Organizer.

TableColumnsRows (approx)Purpose
translation_jobs30~200Job orchestration and state
translation_job_events7~5,000Granular audit trail
translation_quotes32~50Customer quotes + Stripe payment
translation_parameters669Operational configuration store
bedrock_model_catalog1220Multi-provider model registry
translation_tasks15~30Parent-child task orchestration
language_profiles1611Per-language config + auto-tuning
Related documents: