The Trinity BeastTBI Translation Engine

Custom Bedrock-powered document translation service — 8-pass sentinel system, document-level preprocessor with lxml validation, auto-scaling ECS worker fleet

Region: us-east-2 (Ohio) Version: v5.0 June 13, 2026

1. Why a Custom Translation Engine

The Trinity Beast Infrastructure maintains 40 technical documents translated into 11 languages — over 440 translated files total total. The original approach used AWS Translate batch jobs. It worked for simple prose but failed catastrophically on technical documentation.

1.1 Where AWS Translate Fails

AWS Translate is a neural machine translation service optimized for general-purpose text. Technical documentation with embedded code, diagrams, and brand terminology exposes its fundamental limitations:

Failure ModeExampleImpact
Translates code blocksfunction getName()función obtenerNombre()Code no longer executes
Translates variable namesapi_keyclave_apiDocumentation references break
Breaks Mermaid diagramsTranslates node labels inside mermaid blocksDiagrams fail to render
Corrupts HTML structureMerges adjacent elements, drops attributesStyling and layout break
Transliterates brand namesAutoOpsآٹو آپس (Urdu phonetic)Brand identity lost, search breaks
Localizes numeric units32 GB32 Go (French)Technical specs become ambiguous
Drops version numbersPostgreSQL 17.7PostgreSQLVersion-specific guidance lost
Ignores translate attributeTranslates content inside protected zonesDefeats the HTML5 standard mechanism

1.2 The Scale Problem

With 40 documents × 11 languages, every documentation update triggers a translation cascade. Before the custom engine:

1.3 The Solution

A custom Bedrock-powered translation engine that understands the boundary between human language and machine language. The engine uses defense-in-depth across the full pipeline:

Result: A single POST /admin/translate call translates any document from any supported source language into up to 11 target languages, deploys to S3, invalidates CloudFront, rebuilds the search index, and emails a summary. Source language is auto-detected when not specified — no pivot through English required.

2. Architecture

2.1 Pipeline Flow

The translation service is an event-driven pipeline that decouples submission from execution. The operator submits a job; the system handles everything else asynchronously.

Diagram 2.1: End-to-End Pipeline Architecture (v3.1 — BeastTranslate)

flowchart TB
    subgraph Operator
        A[POST /admin/translate]
    end
    subgraph "LPO Server (Go)"
        B[Validate & Enqueue]
        C[Valkey State]
        D[Aurora Record]
    end
    subgraph "AWS Queue"
        E[SQS Queue]
    end
    subgraph "BeastTranslate — Persistent ECS Service"
        direction TB
        BT[SQS Long-Poll Loop]
        BT --> SC{Scale Check}
        SC -->|Single lang| BT2[Process In-Place]
        SC -->|Multi-lang| BT3[Scale Service to N]
        BT3 --> BT4[N Containers Poll Same Queue]
        BT4 --> BT5[Each Takes 1 Language Message]
        BT2 --> TI["Translation Intelligence (Python)"]
        BT5 --> TI
        subgraph "Translation Intelligence (Python)"
            direction LR
            H0[Source Validation]
            H1[Complexity Analysis]
            H2[Document Preprocessor]
            H3[Sentinel System — 4 Types]
            H4[Bedrock — 3-Region Failover]
            H5[Validator — Hard + Soft Tiers]
            H6[Integrity Check + Auto-Repair]
        end
    end
    subgraph "Deployment (Go Lambdas)"
        direction LR
        I[S3 Write]
        J[CloudFront Invalidation]
        K[Search Index Rebuild]
        L[SES Notification]
    end

    A --> B
    B --> C
    B --> D
    B --> E
    E --> BT
    H6 --> I
    I --> J
    J --> K
    K --> L

    style A fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style B fill:#1e293b,stroke:#334155,color:#e2e8f0
    style C fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style D fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style E fill:#2e1065,stroke:#a78bfa,color:#e2e8f0
    style BT fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style SC fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style BT2 fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style BT3 fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style BT4 fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style BT5 fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style H0 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style H1 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style H2 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style H3 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style H4 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style H5 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style H6 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style I fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style J fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style K fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style L fill:#064e3b,stroke:#10b981,color:#e2e8f0
        

2.2 Components

ComponentTypeRuntimePurpose
POST /admin/translate (+ 8 more)Admin APIGoJob submission, monitoring, control
trinity-beast-translation-queueSQSDecouple submission from execution
tbi-translate-pipeEventBridge PipeSQS → Step Function trigger (no glue Lambda)
tbi-translation-orchestratorStep FunctionsFan-out, retry, deploy, finalize orchestration
tbi-translate-workerECS Fargate TaskPython 3.11Bedrock translation + sentinel + validation (no timeout ceiling)
tbi-translate-initLambdaGoRecords execution ARN, transitions queued → running
tbi-translate-deployLambdaGoCloudFront invalidation per document
tbi-translate-finalizeLambdaGoSearch rebuild + SES notification + state transition
translation_jobsAurora tablePermanent job records (28 columns)
translation_job_eventsAurora tableGranular per-doc/lang audit log

2.3 Why Python (The Only Python in the Fleet)

Every other compute workload in The Trinity Beast Infrastructure is written in Go. The translation worker is the sole exception, and for good reason:

Convention note: All Lambda functions use 1770 MB memory (multiple of 3). The worker runs as an ECS Fargate task (2 vCPU / 6 GB) with no timeout ceiling — large documents translate to completion regardless of processing time. The worker is also a unified batch orchestrator: during idle periods (every 33 seconds), it polls Bedrock for completed batch inference jobs, processes output JSONL inline, deploys translated docs to S3, and triggers finalize. Deploy and finalize Lambdas use 60s and 180s timeouts respectively.

3. Sentinel Preprocessing System

The sentinel system is the core innovation that makes reliable technical document translation possible. It operates on a simple principle: the model cannot corrupt what it never sees.

Before any chunk is sent to Bedrock, protected content is replaced with placeholder tokens. The model translates the prose around the placeholders. After translation, the placeholders are swapped back to the original content. Validation then confirms everything survived intact.

3.0 Pre-Sentinelization Passes (v5.0)

Before the sentinel system runs, three automatic preparation passes ensure maximum coverage. These run deterministically on every document — no manual markup needed. The design principle: they can't break what isn't there.

PassFunctionWhat It DoesFailure Class Eliminated
Brand Term Auto-Wrap_auto_wrap_brand_terms()Scans for all 57 protected terms from translation-config.json and wraps each occurrence in <span translate="no">. They become Type A sentinel candidates automatically.Brand transliteration (e.g., The Trinity Beast → ट्रिनिटी बीस्ट)
Code Tag Protection_protect_code_tags()Upgrades every bare <code> tag to <code translate="no">. Pass 1 then lifts them all as Type A sentinels.Code tag mismatch failures (the single largest failure class before v5.0)
Path Protection_fix_paths_absolute()Converts relative paths (assets/, images/, css/, js/) to absolute URLs before translation. The model passes them through unchanged.Broken asset references in translated docs

These passes run in handler.py before the document enters the chunker or sentinel system. They are deterministic, idempotent, and add zero cost — they only manipulate the source HTML locally. After these passes, the sentinel system has maximum coverage: every code tag, every brand term, and every path is already protected before sentinelization begins.

3.1 Sentinel Types and Eight Passes

Type A — Full Element Extraction (__TBP{N}__)

Replaces entire translate="no" elements with a single token. The model sees only the placeholder and places it in the natural position for the target language's word order.

BeforeAfter Sentinel Pass
<span translate="no">CloudFront</span> invalidation __TBP0__ invalidation
<code translate="no">api_key</code> parameter __TBP1__ parameter

Handles arbitrary nesting depth — processes innermost elements first, then sweeps outward until stable.

Type B — Paired Open/Close (__TBO{N}__ / __TBC{N}__)

For plain <span> wrappers (no class attribute) containing translatable text. The wrapper tags become sentinels; the text between them is translated normally. Spans with a class attribute are now extracted as Type A FULL sentinels (see Pass 1b below) since the class denotes a structural/decorative CSS hook that must survive intact.

BeforeAfter Sentinel Pass
<span style="color:#9ece6a">success message</span> __TBO0__success message__TBC0__

The model translates "success message" while the <span style="..."> wrapper survives intact. Class-bearing spans like <span class="badge">, <span class="tree-label">, and <span class="method-tag"> are handled by Pass 1b as full extractions — the model never sees them.

Type C — Numeric Protection (__TBN{N}__)

Protects bare numbers in prose from the model's tendency to drop, paraphrase, or localize them. Matches integers, decimals, percentages, and number+unit pairs.

BeforeAfter Sentinel PassProblem Prevented
uses 1770 MB of memory uses __TBN0__ of memory French translating "MB" → "Mo"
achieves 98.5% uptime achieves __TBN1__ uptime Japanese dropping the decimal
62% cache hit rate __TBN2__ cache hit rate German paraphrasing to words

Type D — Brand Term Protection (__TBT{N}__)

Protects brand terms, product names, and proper nouns that must never be translated or transliterated. Unlike Type A (which requires translate="no" in the source HTML), Type D operates from a centralized configuration list — no source markup needed.

BeforeAfter Sentinel PassProblem Prevented
powered by The Trinity Beast powered by __TBT0__ Hindi transliterating to ट्रिनिटी बीस्ट
deployed on CloudFront deployed on __TBT1__ Arabic transliterating to كلاود فرونت
Cory Dean Kalani __TBT2__ Urdu transliterating person names

Protected terms are defined in translation-config.json (57 terms). The sentinel pass matches terms using word-boundary regex for short terms (≤5 chars) and substring matching for longer terms. Restoration is exact — the original term text is re-injected at the sentinel position.

Sentinel Recovery Pass (Post-Restoration)

Complex-script models (Hindi, Urdu, Arabic) occasionally drop sentinel tokens entirely from their output — the token simply doesn't appear in the translated text. This affects both Type A (FULL) and Type D (TERM) sentinels, particularly in token-dense chunks with 20+ sentinels. The recovery pass runs after normal restoration and before validation:

  1. Iterates all TERM entries in the sentinels list
  2. Checks if the term is present in the source but missing from the restored output
  3. Re-injects the original term text at an approximate position (ratio-based paragraph matching)
  4. Falls back to insertion before the last closing tag if position cannot be determined

This eliminates the class of failures where the model acknowledges the sentinel in its "thinking" but omits it from the output — a behavior observed primarily in Indic scripts with token-dense chunks.

3.2 Processing Flow

Diagram 3.1: Sentinel Preprocessing Flow

flowchart TD
    A[Source HTML Chunk] --> B[Pass 1: Extract translate=no elements]
    B --> B1b[Pass 1b: Extract class-bearing spans as FULL sentinels]
    B1b --> B2[Pass 2: Extract all text-only spans as FULL sentinels]
    B2 --> B3[Pass 2b: Wrap text-only links as paired sentinels]
    B3 --> B4[Pass 2c: Extract nested-HTML and event-handler links as FULL]
    B4 --> D[Pass 3: Replace bare numbers with numeric sentinels]
    D --> D2[Pass 4: Replace brand terms with TERM sentinels]
    D2 --> D3[Pass 5: Extract email addresses as FULL sentinels]
    D3 --> D4[Pass 6: Extract bare prose URLs as FULL sentinels]
    D4 --> E[Send to Bedrock with sentinel-aware prompt]
    E --> F[Receive translated chunk with sentinels intact]
    F --> G[Deduplicate any model-doubled paired sentinels]
    G --> H[Restore sentinels high-to-low index order]
    H --> H2[Recovery pass: re-inject any dropped FULL/TERM sentinels]
    H2 --> H3[Repair pass: fix dropped span/strong/em tags]
    H3 --> I[Run validators against source + restored output]
    I -->|PASS| J[Accept chunk]
    I -->|FAIL| K{Retries remaining?}
    K -->|Yes| L[Retry with strict prompt + temperature jitter]
    L --> E
    K -->|No| M[Raise TranslationError]

    style A fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style E fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style J fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style M fill:#450a0a,stroke:#ef4444,color:#e2e8f0
        

The eight passes execute in strict order — later passes operate on the output of earlier ones. Pass 1b extracts all class-bearing spans as FULL sentinels (eliminating the entire class of span-drop failures for structural/decorative CSS hooks). Pass 2 extracts remaining text-only spans as FULL sentinels. Passes 2b/2c handle anchor tags — text-only links become PAIR sentinels (attributes opaque, inner text translated), while links with nested HTML or JavaScript event handlers are extracted as FULL. Type C (numeric) sentinels protect numbers that appear inside Type B (paired) text. Type D (brand term) sentinels protect terms anywhere in translatable content. Passes 5 and 6 protect email addresses and bare prose URLs from translation/mangling. This provides defense-in-depth across eight layers.

3.3 Restoration and Deduplication

After translation, sentinels are restored in reverse index order (high → low) to prevent prefix collisions (__TBP1__ must not match inside __TBP10__).

A deduplication pass runs before restoration to handle a known model behavior: occasionally the model emits a paired sentinel twice consecutively (a bilingual output instinct). The deduplicator collapses __TBO0__text__TBC0__ __TBO0__text__TBC0__ into a single occurrence.

4. Validator System

Every translated chunk is validated against the source before acceptance. Validators enforce structural integrity and content preservation — if a translation passes all validators, it is guaranteed to be functionally correct (code works, links resolve, diagrams render).

4.1 Validation Checks

ValidatorTypeWhat It ChecksFailure Example
check_protected_termsHardEvery protected term in source appears in output"CloudFront" missing from Japanese output
check_version_numbersHardAll version numbers (X.Y.Z) survive translation"17.7" dropped from PostgreSQL reference
check_preserve_patternsHardURLs, emails, IPs, ARNs, resource IDs, cron expressions, memory sizesARN truncated or IP address reformatted
check_tag_countsHardHTML tag counts match for structural tagsExtra <span> added or <code> dropped
check_translate_no_zonesHardContent inside translate="no" zones unchangedProtected code block content altered

Protected term matching: Short uppercase acronyms (≤4 chars like SQS, ECR, S3) use word-boundary matching to avoid false positives where the acronym appears as a substring (e.g., "ECR" inside "SECRET"). Longer terms use plain substring matching.

Implementation (v2.5): The check_tag_counts and check_translate_no_zones validators use character scanning with exact boundary matching — no regex. We control these tags. We know that a tag starts with < and ends with >. The scanner finds complete opening tags by looking for <tagname followed by a boundary character (>, space, tab, newline, or /), then reads to the closing >. This eliminates false positives from partial regex matches and is immune to edge cases where tag names appear as text content (e.g., documenting translate="no" as literal text inside a code tag).

4.2 Retry Strategy

When validation fails, the engine retries with two progressive adjustments:

  1. Strict prompt activation — adds an explicit warning: "PREVIOUS ATTEMPT FAILED VALIDATION. Be more careful: every protected term and every version number from the input MUST appear unchanged in the output."
  2. Temperature jitter — increments temperature by 0.1 per retry (0.0 → 0.1 → 0.2 → 0.3, capped at 0.5). A deterministic temp=0 retry produces the same erroneous output; temperature jitter lets the model take a different sampling path.

Maximum retries: 3 (configurable). If all attempts fail, a TranslationError is raised with the chunk index, validator detail, and a preview of the problematic chunk.

4.3 Hard vs Soft Failures

Validators are classified into two tiers based on what they protect:

TierTagsBehaviorRationale
Hard (content-critical)<code>, <pre>, <a>Retry → reject on failureMissing code blocks, broken links, or lost pre-formatted content means the translation is functionally broken
Soft (decorative/structural)<span>, <strong>, <em>, <br>Log warning, pass throughMissing styling wrappers don't break functionality — the post-translation integrity check repairs them

This tiered approach eliminates the failure mode where a correctly-translated document is rejected because the model dropped a single decorative <span> wrapper during RTL reordering. The content is correct — only the styling wrapper is missing — and the integrity check restores it automatically.

The ValidationReport aggregates all results and exposes:

4.4 Post-Translation Integrity Check

After translation completes and chunks are reassembled, a full-document integrity check runs before the S3 write. This is the defense-in-depth layer — it repairs structural drift that the per-chunk validator intentionally allows through (soft failures).

Repair Capabilities

IssueDetectionRepair Action
</br> injectionString scan for invalid closing br tagsStrip all occurrences (never valid HTML)
<br> inside Mermaid blocksRegex scan within <pre class="mermaid">Remove (breaks Mermaid syntax)
Mermaid content corruptionByte-for-byte comparison with sourceFlag as warning (cannot auto-repair content changes)
Missing translate="no" span wrappersCompare source protected elements to outputRe-wrap bare content with original element tags
Missing <strong>/<em> wrappersSame pattern as span recoveryRe-wrap bare content

The integrity check only repairs translate="no" elements (where content is byte-for-byte identical between source and output). For translated content that lost its wrapper, the check logs the discrepancy but cannot reliably re-wrap (the content has been translated — matching it to the source wrapper requires semantic understanding).

Design principle: If the translated content is present and correct but the HTML structure is degraded, repair it. Only flag as unrecoverable if content is actually missing or corrupted. The customer sees a clean translation — the repairs happen invisibly.

4.5 Source Document Validation (v2.8)

Before any translation work begins, the source document passes through a validation gate. This catches defects that would cause translation failures or produce broken output — rejecting early saves Bedrock tokens and prevents corrupted translations from reaching S3.

Defect Categories

CategoryWhat It CatchesAuto-Repairable?
STRUCTURALUnclosed tags, malformed HTML, nesting violationsYes (up to 5 unclosed tags)
MERMAIDEmpty diagram blocks, missing type declaration, mismatched bracketsNo — reject with location
ENCODINGBOM markers, null bytes, mixed encodingsYes (strip BOM/nulls)
SIZEDocument exceeds 500 KB, excessive nesting depth (>30 levels)No — reject with size info
CONFLICTtranslate="no" on root element (nothing to translate)No — reject immediately

Validation Flow

  1. Size check — reject if > 500 KB (chunking becomes unreliable at this size)
  2. Encoding check — detect and strip BOM markers, null bytes; flag mixed encodings
  3. Structural HTML check — scan for unclosed tags; auto-repair up to 5 by appending closing tags at the correct nesting level
  4. Mermaid syntax check — validate every <pre class="mermaid"> block has a valid diagram type, balanced brackets, and non-empty content
  5. Conflict check — reject if the root <body> or <html> element has translate="no"

Rejection vs Repair

The validator follows a strict philosophy: try to fix it silently, reject early if you can't. Repairable issues (unclosed tags, BOM markers) are fixed in-place — the customer never knows. Unrecoverable issues produce an actionable defect report with the exact location, what's wrong, and how to fix it.

ValidationResult:
  valid: false
  rejection_reason: "2 unrecoverable defects found"
  defects:
    - severity: error
      category: MERMAID
      location: "Section 5, line 342"
      description: "Empty Mermaid block — no diagram content"
      suggestion: "Add diagram content or remove the empty <pre class='mermaid'> block"
    - severity: error
      category: SIZE
      location: "Document root"
      description: "Document is 612 KB (limit: 500 KB)"
      suggestion: "Split into multiple documents or remove large embedded assets"

Cost savings: A rejected document costs zero Bedrock tokens. Without source validation, a broken document would fail during translation (after burning tokens on partial chunks), produce a corrupted output, and require manual investigation. Source validation catches these cases in <10ms with zero API calls.

4.6 Diagram Integrity (v2.8)

Mermaid diagrams are code — they must survive translation byte-for-byte. The integrity check (section 4.4) now includes dedicated diagram verification with automatic recovery.

Detection

The integrity check counts Mermaid blocks in the source (<pre class="mermaid">) and compares against the translated output. If any diagrams are missing from the output, the auto-stitch mechanism activates.

Auto-Stitch Recovery

When a diagram is missing from the translated output:

  1. Identify which source diagram is absent (by content matching)
  2. Extract the full <div class="diagram-wrap"> block from source (includes label + pre)
  3. Locate the correct insertion point in the output (same section, same relative position)
  4. Inject the source diagram block verbatim — diagrams don't need translation

The stitched diagram is the English version, which is functionally correct — Mermaid syntax is language-independent. The surrounding prose is already translated, so the reader gets translated explanations with a working diagram.

Tag Inventory Integration

The _count_tags function now reports diagram count alongside other structural tags:

Tag Inventory (source → output):
• Trinity-Beast-Performance-Report.html
  IN:  code:75 pre:8 strong:12 em:3 a:6 br:20 diagrams:4
  OUT: code:75 pre:8 strong:12 em:3 a:6 br:20 diagrams:4

If a diagram is lost during translation and auto-stitched back, the final count still matches — the stitch happens before the tag inventory is calculated. A mismatch in the diagrams count after stitching indicates a structural issue that needs manual review.

Result: The Performance Report (75 KB, 4 Mermaid diagrams, 18 sections) translates to French with all 4 diagrams intact — 3 survived translation naturally, 1 was auto-stitched from source. The reader sees no difference.

4.7 Post-Translation Repair Pipeline (v5.0)

Defense-in-depth: even with perfect preprocessing, models occasionally drop or mangle structural elements. The repair pipeline runs after translation and before final validation, catching anything that slipped through.

Code Tag Repair (_repair_code_tags)

Multi-pass repair (up to 6 iterations) that detects dropped <code> wrappers by comparing the translated output against the source document. For each code-tagged value in the source, it searches the output for the bare content and re-wraps it with the original tag. Converges when no more repairs are possible.

Input (broken)Output (repaired)
api_key parameter requires...<code translate="no">api_key</code> parameter requires...
the timestamptz column stores...the <code translate="no">timestamptz</code> column stores...

The repair function matches content from the source's code tags against the output using exact string matching. It handles both self-closing patterns and open/close pairs. Each pass may reveal new repair opportunities (nested cases), hence the multi-pass design with a convergence check.

lxml Syntax Validation (_tidy_validate_and_repair)

Structural HTML repair using lxml's robust parser. Catches and fixes issues that the model introduces in the HTML structure itself — unclosed tags, nesting violations, mismatched attributes, and malformed output. This is the final safety net before the translated chunk is accepted.

Issue DetectedAction
Unclosed <div> or <span>Auto-close at the correct nesting level
Orphaned closing tagsRemove (no matching opener)
Invalid nesting (e.g., <p> inside <p>)Restructure to valid hierarchy
Malformed attributesNormalize quotes and spacing

Uses lxml.html.fragment_fromstring() for parsing — no external binary needed (lxml is in the container's requirements.txt). Falls back to BeautifulSoup if lxml encounters a parse failure it cannot recover from.

Output Integrity Check (_check_output_integrity)

Final gate before a translated chunk is accepted. Compares structural tag counts between source and output — every <code>, <pre>, <strong>, <em>, <a>, <br>, and Mermaid diagram must have matching counts. If counts diverge beyond tolerance, the _recover_missing_wrappers() function attempts targeted restoration before rejecting the chunk.

Design principle: Preprocessing is the fortress — strip everything non-prose before the model sees it. Post-processing is defense-in-depth — repair anything that slips through. The goal is zero repairs needed because prep was thorough, but the safety net is always active.

5. BeastTranslate — Persistent Worker Architecture

Translation execution is handled by BeastTranslate — a persistent ECS Fargate service (tbi-translate-worker-service) that serves as the unified translation orchestrator. It continuously polls the SQS translation queue for realtime jobs AND polls Bedrock for completed batch inference jobs every 33 seconds during idle. Unlike the previous Step Function → RunTask model (v3.0), BeastTranslate is always-on — no cold starts, no orchestration overhead, instant job pickup, and zero Lambda invocations for the batch completion path.

5.1 BeastTranslate Service Design (v3.1)

Diagram 5.1: BeastTranslate Persistent Worker Architecture

flowchart TD
    SQS[SQS Translation Queue] --> LP[Long-Poll Loop - 20s wait]
    LP -->|Message received| DM[Deserialize Job Message]
    DM --> EM{Execution Mode?}
    EM -->|Express - realtime| RT[Process All Languages Sequentially]
    EM -->|Batch - high volume| BT[Scale Service to N Containers]
    BT --> PAR[N Containers Poll Same Queue]
    PAR --> EACH[Each Takes 1 Language Message]
    EACH --> PROC[Translate All Docs for That Language]
    RT --> PROC
    PROC --> CB[POST /admin/translate/callback]
    CB --> DONE{More Messages?}
    DONE -->|Yes| LP
    DONE -->|No - queue empty| IDLE[IDLE - Resume Polling]
    IDLE --> LP

    subgraph GS[Graceful Shutdown]
        SIG[SIGTERM Received] --> FIN[Finish Current Doc]
        FIN --> CLEANEX[Clean Exit]
    end

    subgraph VH[Visibility Heartbeat]
        HB[Every 5 min] --> EXTVIS[Extend SQS Visibility]
        EXTVIS --> PREV[Prevent Re-Delivery]
    end

    style SQS fill:#2e1065,stroke:#a78bfa,color:#e2e8f0
    style LP fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style DM fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style EM fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style RT fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style BT fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style PAR fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style EACH fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style PROC fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style CB fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style DONE fill:#1e293b,stroke:#334155,color:#e2e8f0
    style IDLE fill:#1e293b,stroke:#334155,color:#e2e8f0
    style SIG fill:#4c1d95,stroke:#c084fc,color:#e2e8f0
    style FIN fill:#4c1d95,stroke:#c084fc,color:#e2e8f0
    style CLEANEX fill:#4c1d95,stroke:#c084fc,color:#e2e8f0
    style HB fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style EXTVIS fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style PREV fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
        

Service Specification

PropertyValue
Service Nametbi-translate-worker-service
Cluster NodeBeastTranslate
Image211998422884.dkr.ecr.us-east-2.amazonaws.com/tbi-translate-worker:latest
Resources2 vCPU / 6 GB (per container)
Default Desired Count1
Max Scale12 (one per supported language)
RuntimePython 3.11
Entry Pointtask_runner.py
IAM Roletbi-translate-role
Queuetrinity-beast-translation-queue
Log Group/ecs/tbi-translate-worker
TimeoutNone — runs to completion

How It Works: The Polling Loop

BeastTranslate runs a continuous dual-mode polling loop. When no jobs exist, it idles — consuming negligible CPU while maintaining a warm connection to SQS. The moment a translation job is submitted, the container picks it up within seconds (no cold start, no orchestration delay). During idle periods, it also checks for completed Bedrock batch inference jobs every 33 seconds.

  1. SQS Poll: receive_message(WaitTimeSeconds=20) — blocks for up to 20 seconds waiting for a message
  2. Receive: Deserialize job envelope (job_id, docs[], language, options)
  3. Process: For each document in the array, run the full translation pipeline (sentinel → Bedrock → validate → integrity check)
  4. Heartbeat: Every 5 minutes, extend the SQS message visibility timeout to prevent re-delivery during long-running documents
  5. Callback: On completion, POST results back to the LPO server via /admin/translate/callback
  6. Delete: Remove the message from SQS after successful processing
  7. Batch Check (idle only): Every 33 seconds when no SQS messages arrive, query Aurora for in-progress batch jobs → call GetModelInvocationJob → on completion, read output JSONL from S3, deploy translated docs, invoke finalize
  8. Loop: Return to step 1

Auto-Scaling: Demand-Driven Container Management

BeastTranslate scales itself automatically — no manual intervention required. The system determines container count from the job's language list at submission time, then scales back to 1 when all work completes.

How It Determines Scale

When a translation job is submitted via POST /admin/translate, the LPO server inspects the langs array:

  1. Count languages: desired = len(langs) — one container per language for full parallelism
  2. Cap at 11: Maximum supported target languages (the internal language set minus English)
  3. Scale up: ecs:UpdateService(desiredCount=N) fires immediately after the job is enqueued to SQS
  4. Containers ready: New containers start from ECR image cache in ~30 seconds

How It Determines Scale-Down

When a worker completes a job, the finalize step checks for remaining work before scaling down:

  1. Query active queue: GET /admin/translate/queue — checks active and queued job counts
  2. If queue empty: ecs:UpdateService(desiredCount=1) — scale back to steady-state
  3. If jobs remain: Skip scale-down — let other workers continue processing

This handles back-to-back batch submissions (e.g., 6 docs + 6 docs) correctly — the first batch to finish won't tear down containers while the second is still running.

ScenarioContainersTrigger
Idle (no jobs)1Default steady-state — near-zero CPU
Small job (1–3 languages)1–3Auto-scaled at submission: desired = len(langs)
Full library (11 languages)11Auto-scaled at submission — full parallelism
Post-batch (all jobs done)1Auto-scaled down by finalize step when queue is empty

Zero operator involvement. Submit the job, walk away. The infrastructure sizes itself to the workload, processes at maximum parallelism, then shrinks back to 1 container (~$0.05/hour at idle). KCC manual commands (translate-scale N) still work as an override but are no longer needed for normal operation.

Graceful Shutdown (SIGTERM)

When ECS sends SIGTERM (scale-down or deployment), the container:

  1. Stops polling for new messages immediately
  2. Finishes translating the current document (never abandons mid-document)
  3. POSTs partial progress back via callback
  4. Exits cleanly with code 0

This ensures no work is lost during scale-down events. A partially-completed job resumes from where it left off when the next container picks up remaining messages.

Backward Compatibility

The worker detects its execution mode via environment variables:

This means the same container image works both as a persistent service (normal operation — handling realtime SQS jobs AND batch inference completion) and as a one-shot task (Step Function fallback).

Diagram 5.2: BeastTranslate Auto-Scaling — Multi-Language Job Flow

sequenceDiagram
    participant Admin as Admin / Customer
    participant LPO as LPO Server
    participant ECS as ECS Service
    participant SQS as SQS Queue
    participant BT1 as BeastTranslate 1
    participant BT2 as BeastTranslate 2-11

    Admin->>LPO: POST /admin/translate (3 docs x 11 langs)
    LPO->>SQS: Enqueue job message
    LPO->>ECS: UpdateService(desiredCount=11)
    LPO->>Admin: 200 OK (job_id, state: queued)

    Note over ECS: Spins up 10 new containers (~30s)
    Note over BT1: Already polling (always-on)
    BT1->>SQS: ReceiveMessage
    SQS-->>BT1: Job envelope (all docs, all langs)

    Note over BT1: Processes lang[0] sequentially
    BT2->>SQS: ReceiveMessage (competing consumers)
    Note over BT2: Each takes next available work

    par 11 containers process in parallel
        BT1->>BT1: es - all 3 docs
        BT2->>BT2: fr, de, ru, ja, zh, ar, hi, ur, pt, it
    end

    BT1->>LPO: Finalize (state, CloudFront, search, notify)
    Note over BT1: Check queue - jobs remain? Skip scale-down
    BT2->>LPO: Last worker finalizes
    Note over BT2: Check queue - empty!
    BT2->>ECS: UpdateService(desiredCount=1)
    Note over ECS: 10 containers drain gracefully
    LPO->>Admin: SNS notification email
        

5.2 Error Handling and Recovery

Failure ModeHandlingJob State
Single language fails after 3 retriesCatch → RecordLangFailure pass state, continue other langspartial
All languages for a doc failDeploy Lambda receives empty succeeded list, skips invalidationpartial
Worker timeout (no response)ECS task runs to completion — no timeout ceiling. Step Function waits via ecs:runTask.syncrunning
Step Function execution exceptionFinalize still runs via catch-all; job marked failedfailed
Operator cancels mid-flightStopExecution API call; job marked cancelledcancelled
Step Function fails before FinalizeSelf-healing sweeper detects orphaned job via execution ARN, marks as failedfailed

Per-lang independence: Failure of one (doc, lang) pair never aborts work on the other 10 languages. This is enforced by the Step Function's Catch on the inner Map iterator — errors are captured as data, not propagated as exceptions.

5.3 EventBridge Pipe (Disabled — Legacy Path)

⚠️ Status: STOPPED (June 2, 2026). The tbi-translate-pipe EventBridge Pipe is disabled. BeastTranslate now polls the SQS queue directly — there is no orchestration layer between the queue and the worker. The Pipe and Step Function are retained as a legacy fallback but are not active for any translations. As of v4.0 (June 4, 2026), the worker also handles batch inference completion polling (every 33s during idle), eliminating the tbi-translate-batch-poll and tbi-translate-batch-process Lambdas entirely.

The legacy tbi-translate-pipe connected SQS to the Step Function without a glue Lambda:

Why disabled: With BeastTranslate polling the same queue, both consumers would compete for messages — causing double-processing. The persistent worker replaced the Pipe → Step Function → RunTask chain for all Express (real-time) translations. The Step Function path remains available for batch inference (Standard tier) where Bedrock's batch API requires a different execution model.

5.4 Self-Healing Sweeper

The sweeper runs automatically on every GET /admin/translate/health call (piggybacked) and is also available as a dedicated POST /admin/translate/sweep endpoint.

It scans all jobs in tx:active (the Valkey SET of active job IDs). For each job older than 15 minutes in queued or running state:

All sweep actions are logged to translation_job_events for audit trail.

Result: This eliminates the stuck queue problem permanently — no manual cleanup needed. Jobs that silently fail are automatically detected and marked, keeping the active set accurate and the queue healthy.

5.5 Job Phase Transitions

The job state now reflects the exact phase of execution:

PhaseMeaning
queuedSubmitted to SQS, waiting for BeastTranslate to pick up the message (typically <20 seconds)
runningBeastTranslate received the message, worker translating documents
deployingAll translations complete, deploy Lambda creating CloudFront invalidations
finalizingDeploy complete, finalize Lambda rebuilding search index and writing final state
succeeded / partial / failedTerminal states — all sub-tasks complete, email notification sent

This gives real-time visibility into exactly where a job is in the pipeline.

6. Admin API (9 Endpoints)

All endpoints require the X-Admin-Key header. They are served by the LPO server (Go) alongside the existing admin routes.

6.1 Submit Translation Job

POST /admin/translate

Submits a new translation job. Validates inputs, checks cost limits, creates job state in Valkey (synchronous) and Aurora (async goroutine), enqueues to SQS.

// Request
POST /admin/translate
X-Admin-Key: tbcc-admin-...
X-Idempotency-Key: my-unique-key (optional)
Content-Type: application/json

{
  "docs": ["Trinity-Beast-API-Reference.html", "Trinity-Beast-Architecture-Guide.html"],
  "langs": "all",
  "options": {
    "force": false,
    "delta": false,
    "skip_search_rebuild": false,
    "skip_validation": false
  }
}

// Response 200
{
  "status": "✅ [LPO] [us-east-2] [BeastMain] [/admin/translate] [200]",
  "status_code": 200,
  "endpoint": "/admin/translate",
  "cluster_node": "BeastMain",
  "region": "us-east-2",
  "language": "en",
  "timestamp": "2026-05-16T16:42:00Z",
  "data": {
    "job_id": "1747407720-a3f8b2c1d4e5",
    "state": "queued",
    "submitted_at": "2026-05-16T16:42:00Z"
  },
  "error": ""
}

Validation rules:

6.2 Monitoring Endpoints

GET /admin/translate/status/{job_id}

Returns the full job state. Aurora is the primary source — state, timestamps, docs, langs, cost, and Step Function ARN are read from translation_jobs. Real-time per-doc/lang progress is overlaid from Valkey (written per-pair by the worker, too frequent for Aurora writes). If Aurora doesn't have the job yet (async insert still pending), falls back to Valkey.

GET /admin/translate/queue

Lists all pending and active jobs (state in queued or running).

GET /admin/translate/history

Returns the last 50 completed jobs from translation_jobs in Aurora, ordered by submission date descending. Includes state, docs, succeeded/failed pair counts, cost, and reason. Falls back to the Valkey tx:history list if Aurora is unavailable.

GET /admin/translate/health

System health overview:

{
  "status": "✅ [LPO] [us-east-2] [BeastMain] [/admin/translate/health] [200]",
  "status_code": 200,
  "endpoint": "/admin/translate/health",
  "cluster_node": "BeastMain",
  "region": "us-east-2",
  "language": "en",
  "timestamp": "2026-05-16T17:30:00Z",
  "data": {
    "queue_depth": 0,
    "active_jobs": 1,
    "last_completed_at": "2026-05-16T17:30:00Z",
    "last_state": "succeeded",
    "daily_spend_usd": "12.40",
    "daily_spend_limit_usd": "600.00",
    "daily_input_tokens": 284150,
    "daily_output_tokens": 312480,
    "daily_token_limit": 50000000,
    "swept_jobs": 0
  },
  "error": ""
}

6.3 Control Endpoints

POST /admin/translate/cancel/{job_id}

Stops the Step Function execution via StopExecution API. Marks job as cancelled. Returns 409 if already in a terminal state.

POST /admin/translate/retry-failed/{job_id}

Creates a new job from the failed (doc, lang) pairs of a completed-with-partial job. Returns 409 if the original is still running.

POST /admin/translate/sweep

Manually triggers the self-healing sweeper. Idempotent — safe to call repeatedly.

// Response 200
{
  "status": "✅ [LPO] [us-east-2] [BeastMain] [/admin/translate/sweep] [200]",
  "status_code": 200,
  "endpoint": "/admin/translate/sweep",
  "cluster_node": "BeastMain",
  "region": "us-east-2",
  "language": "en",
  "timestamp": "2026-05-16T18:00:00Z",
  "data": {
    "swept": 2,
    "checked": 5,
    "results": [
      {
        "job_id": "1747407720-a3f8b2c1d4e5",
        "prior_state": "running",
        "submitted_at": "2026-05-16T16:42:00Z",
        "sfn_status": "FAILED",
        "action": "marked_failed"
      }
    ]
  },
  "error": ""
}

6.4 Worker Callback Endpoints

These endpoints are called by the worker task and finalize Lambdas to update Aurora without needing direct database access (worker and Lambdas are outside the VPC).

POST /admin/translate/update/{job_id}

Updates job state, progress, cost, and timing fields. Called by worker task after each (doc, lang) translation and by finalize Lambda on completion.

POST /admin/translate/event/{job_id}

Records a granular event in the translation_job_events table. Used for audit trail — each doc/lang start, success, failure, retry is logged as a separate event.

Fire-and-forget pattern: Both callback endpoints always return 200 regardless of Aurora write outcome. The translation pipeline must never fail because observability data couldn't be written. Errors are logged but never propagated.

7. Aurora Observability — Source of Truth

Aurora is the authoritative record for all translation job state. Valkey serves one specific role: real-time per-pair progress updates during active execution (written too frequently for Aurora). For everything else — job state, history, cost, audit trail — Aurora is read first.

Design principle: Valkey is the price cache, search indexes, and real-time counters. It is not a job ledger. Aurora is the ledger. When you need to know what was translated, when, at what cost, and with what result — query Aurora.

7.1 translation_jobs Table

One row per job submission. 28 columns covering the full lifecycle. This table is the ground truth for gap analysis, cost reporting, and audit:

Column GroupFieldsPurpose
Identityid, job_id, idempotency_keyUnique identification and deduplication
Statestate, submitted_at, started_at, completed_atLifecycle tracking — authoritative terminal state
Inputdocs (JSONB), langs (JSONB), options (JSONB)What was requested
Progresstotal_pairs, succeeded_pairs, failed_pairs, progress (JSONB)Per-doc/lang status map
Costbedrock_cost_usd, bedrock_invocationsSpend tracking per job
Executionstep_function_arn, errors (JSONB), elapsed_secondsTraceability and debugging
Deploymentcloudfront_invalidation_ids, search_index_rebuilt, notification_sentPost-translation actions
Lineageretry_of, reasonRetry chain and submission reason
Metadatasubmitted_by, created_at, updated_atAudit trail

Gap analysis query: To find which documents have never been translated, query SELECT DISTINCT jsonb_array_elements_text(docs) FROM translation_jobs ORDER BY 1 and compare against the S3 document list. Aurora is the only reliable source for this — Valkey keys expire and don't persist across cache flushes.

7.2 translation_job_events Table

Granular audit log — one row per significant event in a job's lifecycle. Used by the retry-failed handler as the authoritative source of which (doc, lang) pairs failed:

ColumnTypeExample Values
job_idVARCHAR1747407720-a3f8b2c1d4e5
event_typeVARCHARlang_started, lang_succeeded, lang_failed, deploy_started, finalize_complete
docVARCHARTrinity-Beast-API-Reference.html
langVARCHARja, ar, es
detailJSONBCost, chunk count, error message, validator report
created_atTIMESTAMPEvent timestamp

7.3 Read/Write Strategy

The translation system uses a deliberate split between Aurora and Valkey based on access pattern:

DataPrimary StoreReason
Job state (queued/running/succeeded/failed)AuroraAuthoritative terminal state — never expires, queryable, auditable
Job history (last 50 completed)AuroraPermanent record — survives cache flushes, supports gap analysis
Per-pair progress (es: succeeded, ja: running…)ValkeyWritten per-pair during execution — too frequent for Aurora writes, only needed during active polling
Daily spend counterValkeyNeeds atomic INCRBYFLOAT and 24h TTL auto-reset — Aurora is wrong tool for this
Active job setValkeyFast set membership check on every submit — Aurora query would add latency to the hot path

Write path

Read path

Do not rely on Valkey for job state. Valkey keys have no TTL on job hashes and can be flushed, evicted under memory pressure, or simply stale if the finalize Lambda's update call was lost. Aurora is the record of what happened. Valkey is the window into what is happening right now.

8. Cost Protection

The translation engine calls Bedrock for every chunk of every document in every language, using the customer's chosen agent. Without guardrails, a single typo in a batch submission could trigger hundreds of expensive API calls.

8.1 Three Protection Layers

LayerWhereLimitBehavior on Breach
Per-request limitsAdmin API (submit handler)Max 6 docs, max 12 langs, max 3 active jobs400 Bad Request (docs/langs) or queue in SQS (active jobs)
Daily dollar capAdmin API (submit handler)$600/day (autoops:bedrock:spend:daily)429 Too Many Requests until counter expires
Daily token capAdmin API (submit handler)50M combined tokens/day (autoops:bedrock:tokens:input:daily + autoops:bedrock:tokens:output:daily)429 Too Many Requests until counters expire
Per-invocation trackingWorker taskIncrements after every Bedrock callSource of truth for daily counters

8.2 Spend Tracking

Two parallel counters track daily usage — a dollar cap and a token cap. Both live in Valkey with 24-hour TTL auto-reset and are checked on every job submission.

Dollar Cap (autoops:bedrock:spend:daily)

Why $600? A full batch translation of the entire 40-document library × 11 languages costs approximately $726 in raw Bedrock spend at ~$1.65 per doc-language pair (Sonnet 4.6) — but in practice the library is never re-translated all at once. Typical batches are 3 or 6 documents (per the Trinity Beast multiples-of-3 convention) and run well under $200. The $600 cap is a daily safety guardrail with comfortable headroom for several batches plus normal AutoOps overhead (threat analysis, digests, support) in the same 24-hour window.

Token Cap (autoops:bedrock:tokens:input:daily + autoops:bedrock:tokens:output:daily)

Kill switch: Setting autoops:bedrock:kill = "1" in Valkey causes both the submit endpoint and the worker task to refuse all operations. Use this for emergency cost containment.

Pricing formula:

The engine is completely agent-agnostic. Any model accessible through Amazon Bedrock's invoke_model API can serve as a translation agent — whether it uses the Anthropic Messages format or the OpenAI-compatible format. The engine auto-detects the provider and constructs the appropriate request/response envelope. We selected these six agents based on three criteria: (1) quality on multilingual document translation, (2) availability across our failover regions (us-east-2, us-east-1, us-west-2), and (3) distinct cost/quality tradeoffs that let customers choose the right agent for their workload.

Token rates (stored in Aurora translation_parameters, cached in Valkey):

AgentTierInput/1MOutput/1MSpeedRegions
Qwen3 235B💰 Value$0.22$0.880.6×east-2, west-2
Mistral Large 3💰 Value$0.50$1.500.6×east-2, east-1, west-2
DeepSeek V3💰 Value$0.58$1.680.7×east-2, west-2
Claude Haiku 3.5⚡ Standard$0.80$4.000.5×All (cross-region)
Claude Sonnet 4.6🏆 Premium$3.00$15.001.0×All (cross-region)
Claude Opus 4👑 Elite$15.00$75.001.5×All (cross-region)

Speed factor is relative to Sonnet 4.6 (1.0×). Lower means faster — Haiku at 0.5× processes in half the time. The speed factor directly affects the duration-based infrastructure cost: faster agents cost less in compute time per pair.

Infrastructure cost formula (duration-based):

Regional failover: If the primary region (us-east-2) returns a timeout or 503, the engine retries up to 2 times in the same region, then fails over to the next available region. Anthropic models use cross-region inference profiles (us. prefix) and route automatically. Qwen and DeepSeek fail over from east-2 to west-2. Mistral is available in all 3 regions.

Typical costs (per 50 KB document × 1 language, Express):

AgentBedrockInfraCustomer Price
Qwen3 235B$0.02$0.01$0.04
Mistral Large 3$0.04$0.01$0.06
DeepSeek V3$0.04$0.01$0.07
Claude Haiku 3.5$0.09$0.01$0.13
Claude Sonnet 4.6$0.32$0.01$0.44
Claude Opus 4$1.61$0.01$2.11

$3.00 minimum per translated document: Every translated document has a floor price of $3.00, regardless of size or agent. The table above shows the raw calculated cost — but any value below $3.00 is quoted at $3.00 to the customer. This floor ensures the service is priced at the value of the result (a professionally translated document), not the cost of the compute. Larger documents (typically 100 KB+) naturally exceed this floor on all agents. The floor primarily affects small documents on Value-tier agents.

8.3 Infrastructure Integration

Translation engine metrics are exposed through two public interfaces:

Email notification timing: The email notification is the absolute LAST step in the pipeline. It fires only after: translation, deployment, search index rebuild, state update, and history push are ALL complete. The email is a comprehensive report including: job summary, translation results, CloudFront invalidation IDs, search index status, and any Bedrock error details. If Bedrock reports validation failures, the specific error messages and validator details are included in the email.

9. CLI Compatibility

The existing CLI tool (scripts/kcc_helpers/translate_doc.py) continues to work unchanged. A --remote flag routes through the new service instead of running Bedrock locally:

FlagBehaviorUse Case
--local (current default)Runs translator engine in-process, calls Bedrock directly from laptopDevelopment, debugging, single-doc quick fixes
--remotePOSTs to /admin/translate, polls /admin/translate/status/{id} every 5s, streams progress to stdoutProduction translations, batch operations

The --remote flag produces identical terminal output to local mode — same progress bars, same chunk counters, same completion summary. The operator's workflow doesn't change; only the execution path does.

Default flip plan: Start with --local as default to avoid surprising anyone. After 30 days of clean production runs through the service, flip the default to --remote and add --local as the explicit fallback.

10. Configuration Reference — Protected Terms

All translation behavior is driven by a single config file: scripts/translation-config.json. This is the shared source of truth consumed by both the Python engine and the Go admin API.

10.1 Protected Terms (57 entries)

Brand names, product names, AWS services, exchange names, and acronyms that must never be translated or transliterated:

Cross Power Ministries of Pakistan, The Trinity Beast Infrastructure,
The Trinity Beast, Trinity Beast Command Center, Kiro Command Center,
Cory Dean Kalani, Shafiq Bhatti, BeastWebhook, BeastMirror, BeastMain,
BeastLRS, Claude Sonnet 4.6, Bedrock, ElastiCache, EventBridge,
CloudFront, GuardDuty, CloudWatch, CloudTrail, Step Functions,
Crypto.com, Coinbase, Gate.io, Gemini, Kraken, Aurora, Valkey,
Stripe, Kiro, Fargate, PostgreSQL, Lambda, Route 53, AutoOps,
TBCC, CPMP, TBI, KCC, OKX, ECR, ECS, ALB, NLB, WAF, SNS, SQS,
SES, VPC, IAM, S3 ...

Per-Request Protected Terms

In addition to the global protected terms list, you can submit document-specific terms via the protected_terms array in the translation request. This is useful for:

POST /admin/translate
{
  "docs": ["Trinity-Beast-API-Reference.html"],
  "langs": "all",
  "protected_terms": ["MyCustomService", "SpecialEndpoint", "ProjectAlpha"]
}

Per-request terms are merged with the global list for that job only. They do not persist across jobs.

10b. Configuration Reference — Preserve Patterns

10.2 Preserve Patterns

Regex patterns for technical tokens that must survive translation unchanged:

Pattern NameMatchesExample
urlHTTP/HTTPS URLshttps://api.cpmp-site.org/admin/translate
emailEmail addressesCoryDeanKalani@CPMP-Site.org
memory_sizeNumber + memory unit1770 MB, 32 GB
percentageNumber + %98.5%, 62%
cron_exprCron expressionscron(0 11 * * ? *)
ip_addressIPv4 with optional CIDR10.0.1.0/24
aws_arnAWS ARN formatarn:aws:sns:us-east-2:211998422884:tbi-ops-notifications
aws_resource_idAWS resource identifiersvpc-03deaddb7083cd59c, sg-050b617f93b2388f6

10c. Configuration Reference — Limits

10.3 Limits

ParameterValuePurpose
max_chunk_chars6000Default maximum characters per chunk (Latin scripts: es, pt, fr, de)
max_chunk_chars_by_langSee belowPer-language overrides for complex scripts
max_retries3Retry attempts per chunk on validation failure
request_timeout_seconds300Per-Bedrock-call timeout (5 minutes — large RTL chunks need headroom)
max_output_tokens8192Maximum tokens in Bedrock response

Per-language chunk size overrides:

LanguagesChunk SizeRationale
hi, ur, ar3000 charsDevanagari and Arabic scripts expand significantly during translation. Smaller chunks prevent Bedrock timeouts.
ja, zh, ru4500 charsCJK and Cyrillic have moderate expansion. Mid-range chunks balance throughput and reliability.
es, pt, fr, de, it6000 chars (default)Latin scripts translate quickly with minimal expansion.

11. Operations Guide

11.1 Submitting a Translation Job

Single document, all languages:

curl -s -X POST \
  -H "X-Admin-Key: $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"docs":["Trinity-Beast-API-Reference.html"],"langs":"all"}' \
  https://api.cpmp-site.org/admin/translate | jq .

Multiple documents, specific languages:

curl -s -X POST \
  -H "X-Admin-Key: $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"docs":["Trinity-Beast-API-Reference.html","Trinity-Beast-Architecture-Guide.html"],"langs":["es","pt","fr"]}' \
  https://api.cpmp-site.org/admin/translate | jq .

With idempotency key (safe to retry):

curl -s -X POST \
  -H "X-Admin-Key: $ADMIN_KEY" \
  -H "X-Idempotency-Key: api-ref-2026-05-16" \
  -H "Content-Type: application/json" \
  -d '{"docs":["Trinity-Beast-API-Reference.html"],"langs":"all"}' \
  https://api.cpmp-site.org/admin/translate | jq .

11.2 Monitoring Progress

# Check job status
curl -s -H "X-Admin-Key: $ADMIN_KEY" \
  https://api.cpmp-site.org/admin/translate/status/{job_id} | jq .

# View queue
curl -s -H "X-Admin-Key: $ADMIN_KEY" \
  https://api.cpmp-site.org/admin/translate/queue | jq .

# System health
curl -s -H "X-Admin-Key: $ADMIN_KEY" \
  https://api.cpmp-site.org/admin/translate/health | jq .

# Recent history
curl -s -H "X-Admin-Key: $ADMIN_KEY" \
  https://api.cpmp-site.org/admin/translate/history | jq .

11.3 Troubleshooting

SymptomCauseResolution
Job stuck in queuedEventBridge Pipe not consumingCheck Pipe status in console; verify IAM role
429 on submitDaily spend cap hit ($600)Wait for 24h TTL expiry, or reset manually: SET autoops:bedrock:spend:daily 0
Partial completionSome languages failed validationPOST /admin/translate/retry-failed/{id}
Worker timeoutDocument too large (many chunks)Check Step Function execution history for the failing chunk index
Cancel returns 404Job only in Aurora, not ValkeyCancel handler falls back to Aurora — ensure latest code is deployed
No email notificationFinalize Lambda errorCheck CloudWatch logs for tbi-translate-finalize
Search not updatedSearch rebuild timed outRun bash scripts/kcc.sh build-search manually

Cancel a running job:

curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" \
  https://api.cpmp-site.org/admin/translate/cancel/{job_id}

This stops the Step Function execution immediately. Documents already translated and deployed remain live. The search index is rebuilt for whatever landed successfully.

12. Regional Failover

The translation engine implements automatic regional failover to maintain availability during Bedrock service disruptions. This was added after a us-east-2 outage during development exposed the single-region weakness — better to discover this before customers were affected.

12.1 Failover Chain

When a Bedrock call fails with a service-level error, the engine automatically retries in the next region:

PriorityRegionLocationRole
1us-east-2OhioPrimary — all normal traffic
2us-east-1N. VirginiaFirst fallback
3us-west-2OregonSecond fallback

The failover is transparent to the caller — the translation completes successfully as long as at least one region is available. A log message records when a fallback region was used.

12.2 Trigger Conditions

Failover is triggered for service-level errors and timeouts that indicate the region is unavailable or overloaded:

Error TypeMeaningAction
ServiceUnavailableExceptionBedrock service is down (503)Retry same region once, then failover
ThrottlingExceptionRate limit or capacity exceededRetry same region once, then failover
ModelStreamErrorExceptionModel streaming failureRetry same region once, then failover
ReadTimeoutErrorResponse took longer than 300sRetry same region once, then failover
ConnectTimeoutErrorCould not establish connection within 10sRetry same region once, then failover

Other errors (validation failures, authentication errors, malformed requests) are not retried — they would fail identically everywhere.

Per-Region Retry with Backoff

Each region gets 2 attempts before the engine moves to the next region. A 5-second backoff between attempts allows transient pressure to clear:

us-east-2 (attempt 1) → timeout → wait 5s →
us-east-2 (attempt 2) → timeout →
us-east-1 (attempt 1) → timeout → wait 5s →
us-east-1 (attempt 2) → timeout →
us-west-2 (attempt 1) → timeout → wait 5s →
us-west-2 (attempt 2) → timeout → FAIL (raise exception)

Total: 6 attempts across 3 regions. In practice, transient spikes clear within 5-10 seconds, so the retry within the same region usually succeeds without needing failover.

12.3 Cost Impact

Regional failover has negligible cost impact:

Resilience benefit: A complete regional outage no longer blocks translations. The May 2026 us-east-2 outage would have caused a 4-hour translation blackout without this feature. With failover, translations continued uninterrupted via us-east-1.

13. Document Preparation Guide

Proper document preparation ensures clean translations with minimal post-processing. This section covers the conventions that help the translation engine produce accurate results.

13.1 Code Tag Usage

The <code translate="no"> tag tells the translation engine to preserve content exactly as written. Use it correctly to avoid formatting artifacts in translated documents.

When to Use Code Tags

Use <code translate="no"> for technical identifiers that would break if translated:

When NOT to Use Code Tags

Do not wrap pure data values in code tags — they should appear as plain text:

Why this matters: The translation engine's sentinel system protects code-tagged content from translation. If you wrap "32 GB" in code tags, it survives translation — but so does the monospace formatting, which looks wrong in prose. The engine has a post-processor that strips spurious code wrappers from pure numeric values, but it's better to author correctly from the start.

Quick Test

Ask yourself: "If I changed this value, would the system break?" If yes, use code tags. If no (it's just a number or measurement), leave it as plain text.

ContentWould changing it break something?Use code tags?
tbi-ops-notifyYes — Lambda name✅ Yes
1770 MBNo — just a memory size❌ No
/admin/translateYes — API endpoint✅ Yes
$600No — just a dollar amount❌ No
max_retriesYes — config key✅ Yes
3 retriesNo — just a count❌ No

13.2 Protected Terms Submission

For documents with domain-specific terminology not in the global protected terms list, submit additional terms with the translation request:

POST /admin/translate
{
  "docs": ["Customer-Integration-Guide.html"],
  "langs": "all",
  "protected_terms": [
    "CustomerCorp",
    "ProjectPhoenix",
    "DataSync API",
    "IntegrationHub"
  ]
}

These terms are added to the global list for this job only. The engine will:

  1. Wrap each term in <span translate="no"> during preprocessing
  2. Replace with sentinel tokens before sending to Bedrock
  3. Restore the original terms after translation
  4. Validate that all terms survived intact

Best Practices for Protected Terms

13.3 Clarification Workflow

When the translation engine encounters ambiguous content, it may flag it for human review. This happens in the validation phase when:

Flagged content appears in the job status response under the warnings array:

{
  "status": "✅ [LPO] [us-east-2] [BeastMain] [/admin/translate/status/1747407720-a3f8b2c1d4e5] [200]",
  "status_code": 200,
  "endpoint": "/admin/translate/status/1747407720-a3f8b2c1d4e5",
  "cluster_node": "BeastMain",
  "region": "us-east-2",
  "language": "en",
  "timestamp": "2026-05-16T17:45:00Z",
  "data": {
    "job_id": "1747407720-a3f8b2c1d4e5",
    "state": "succeeded",
    "warnings": [
      "chunk 14 (ja): soft failure — protected term 'DataSync' may have been altered",
      "chunk 22 (ar): soft failure — version number format changed from X.Y.Z to X.Y"
    ]
  },
  "error": ""
}

Soft failures don't block the translation — the output is still deployed. Review the warnings and manually verify the flagged sections if needed.

Feedback loop: If you consistently see the same term flagged, add it to the global protected terms list in scripts/translation-config.json. This prevents future warnings and improves translation quality across all documents.

14. Pre-Scan Complexity Analysis

Before translation begins, the engine analyzes each document for complexity factors that may cause validation failures. This pre-scan identifies code-heavy sections and recommends whether to proceed, exercise caution, or split the document.

14.1 Complexity Metrics

The pre-scan calculates a complexity score for each section based on:

FactorWeightWhy It Matters
Code tags1.0 per tagEach code tag must survive translation intact — more tags = more validation points
Code tags in tables1.5 per tagTables with code examples are harder — model tends to merge or drop tags when reordering
Tables2.0 per tableTables with technical content require careful structure preservation
Pre blocks0.5 per blockUsually have translate="no" — lower risk but still tracked
Protected spans0.3 per spanHandled by sentinel system — low risk

Section Thresholds

14.2 Recommendations

Based on the analysis, the pre-scan returns one of three recommendations:

RecommendationCriteriaAction
PROCEEDScore < 20, no high-density sectionsTranslate normally — low failure risk
CAUTIONScore < 50, ≤ 2 high-density sectionsProceed but monitor — may need retries
SPLITScore ≥ 50 OR > 3 high-density sectionsConsider splitting document before translation

Pre-Scan Output Example

DOCUMENT TRANSLATION COMPLEXITY ANALYSIS
========================================
Total characters: 81,107
Total sections: 13
Total code tags: 287
Overall complexity score: 415.4
Recommendation: SPLIT

WARNINGS:
  ⚠️  Document has 287 code tags — high validation failure risk
  ⚠️  Section 'step-function' has 51 code tags — consider simplifying
  ⚠️  Section 'observability' has 48 code tags — consider simplifying

HIGH-DENSITY SECTIONS (9):
  • architecture: 11 code tags, score 22.7
  • sentinel-system: 22 code tags, score 34.5
  • step-function: 51 code tags, score 71.1
  ...

SUGGESTED SPLIT: 4 parts
  → Split after 'validators' (After 3 high-density sections)
  → Split after 'observability' (After 3 high-density sections)
  → Split after 'doc-prep' (After 3 high-density sections)

14.3 Document Splitting

When the pre-scan recommends splitting, it suggests natural break points at section boundaries. Options for handling complex documents:

Option 1: Split into Multiple Documents

Create separate HTML files for each part (e.g., Doc-Part1.html, Doc-Part2.html). Each part translates independently with lower failure risk. Link them together with navigation.

Option 2: Simplify High-Density Sections

Reduce code tag density in problematic sections:

Option 3: Translate in Batches

Submit fewer languages per job (e.g., 3 instead of 11). This reduces concurrent load and allows the model more capacity per translation. Retry failed languages individually.

Per-Language Split Thresholds (v2.5)

Complex scripts (Urdu, Arabic, Hindi) struggle with high tag density even when Latin-script languages handle the same chunk fine. The prescan now applies per-language code tag limits — tighter thresholds for scripts where the model is more likely to drop markup:

LanguageScriptMax Code Tags per Part
Default (Latin, CJK, Cyrillic)Latin / Kanji / Cyrillic30
Urdu (ur)Nastaliq18
Arabic (ar)Arabic18
Hindi (hi)Devanagari20

Configuration key: max_code_tags_per_part_by_lang in translation-config.json. When the prescan runs for a specific language, it uses that language's threshold to determine split points. A document that translates as one part for Spanish may automatically split into 2-3 parts for Urdu.

Result: The Translation Service document (22 code tags in the Architecture section) previously failed for Urdu on every attempt. With the per-language threshold of 18, the prescan splits Architecture and Observability into separate parts. All 11 languages now translate successfully.

This document is an edge case: The Translation Engine documentation itself has 287 code tags and a complexity score of 415 — it's documentation about a translation engine, so it's packed with code examples. Most documents score under 50.

14.4 Splitting Safety Valve (v2.8)

Even when code tag density is low, a single part that exceeds the model's effective output window will be silently truncated — sections at the end of the part simply disappear from the output. The safety valve enforces a hard character limit per part regardless of prescan recommendations.

The Problem

The Performance Report (75 KB) has 18 sections with moderate code density. The prescan recommended splitting into 3 parts based on code tag thresholds. But Part 1 was 36 KB of prose-heavy content — well under the code tag limit but far beyond the model's output token budget. The model translated the first ~24 KB faithfully, then its output simply stopped. Sections 7-8 (partner-sustained, udp-engine) vanished without any error signal.

The Fix

# Safety valve: max chars per part (prevents model output truncation)
MAX_CHARS_PER_PART = 24000  # ~6000 tokens, well within max_output_tokens

The splitter now enforces a 24 KB ceiling on every part. If a part exceeds this limit after the prescan-based split, it is further subdivided at the nearest section boundary. This is conservative — Latin scripts could handle ~30 KB, but 24 KB is safe for all languages including RTL and CJK where token efficiency is lower.

Impact

DocumentBefore (v2.6)After (v2.8)
Performance Report (75 KB)3 parts (Part 1: 36 KB — truncated)4 parts (largest: 22 KB — clean)
API Reference (180 KB)8 parts (all under 24 KB already)8 parts (no change — already safe)
Translation Engine (116 KB)11 parts (code-density driven)11 parts (no change — code splits dominate)

The safety valve only activates when the prescan's code-tag-based splitting produces oversized parts. For most documents, the code density split already keeps parts well under 24 KB.

Result: Performance Report went from dropping 3 entire sections (silent truncation) to a perfect 18/18 sections, 4/4 diagrams, 20/20 <br/> tags across all 11 languages.

15. Document-Level Preprocessor

The document-level preprocessor is a critical layer that runs before chunking. It extracts complex HTML elements from the entire document, replacing them with simple Unicode placeholders. After translation, the postprocessor restores the original elements. This eliminates the "model drops tags" failure mode entirely.

15.1 The Problem

The per-chunk sentinel system (Section 3) works well for most documents, but complex documents with many <code>, <strong>, and <em> tags exposed a fundamental limitation:

Example failure: A chunk with 27 <code translate="no"> tags consistently failed validation with tag count mismatch (27→23) — the model dropped 4 placeholders despite explicit instructions to preserve them.

15.2 The Solution

Extract ALL problematic elements from the entire document before chunking. The model never sees these elements — only simple Unicode placeholders that it cannot confuse with HTML structure.

Key insight: The model cannot corrupt what it never sees. By extracting elements at the document level, each chunk has zero complex tags to worry about. The model translates clean prose with obvious markers.

Before vs After

Pipeline StageBefore (v2.2)After (v2.3)
Document received290 code tags290 code tags
After preprocessing0 code tags (290 placeholders)
Per-chunk sentinels20+ placeholders per chunk0-2 placeholders per chunk
Model cognitive loadHigh (complex structure)Low (clean prose)
Validation failuresFrequent on complex docsRare

15.3 Processing Flow

The preprocessor uses a two-phase extraction model that integrates into the translation pipeline as the first step:

Document → PHASE 1 (Density Lift) → PHASE 2 (Individual Extract) → Chunk → Translate → Reassemble → POSTPROCESS → Output
              ↓                           ↓                                                              ↓
     Lift entire dense containers  Extract remaining code/pre/       Single-pass flat restore
     as ⟦BLOCK_NNN⟧ placeholders   strong/em/numeric individually    (no nesting, no iteration)
     Original HTML preserved        Build manifest mapping            from manifest

Phase 1 runs FIRST on the raw DOM. It inspects container elements (tr, li, dd, dt, p, div) and lifts any container whose ratio of protected elements to prose characters exceeds a per-language density threshold. The lifted BLOCK entries store the original HTML byte-for-byte — no nested placeholders inside them.

Phase 2 then runs on whatever Phase 1 didn't lift, extracting individual elements (pre, code, spans, strong/em, a-tags, numeric patterns) the same way as before.

Integration in engine.py

def translate(text, target_lang, mode="html", ...):
    # Build preprocessor config from language profile
    preprocess_config = {"script_family": profile.get("script_family", "latin")}
    
    # Step 1: PREPROCESS — Phase 1 lifts dense blocks, Phase 2 extracts individuals
    simplified_html, manifest = preprocess_for_translation(text, preprocess_config)
    
    # Step 2: CHUNK — Split simplified document
    head, chunks, tail = chunker.split_document(simplified_html, lang=target_lang)
    
    # Step 3: TRANSLATE — Each chunk through Bedrock (per-chunk sentinels still run)
    for chunk in chunks:
        translated = _translate_chunk(chunk, ...)
    
    # Step 4: REASSEMBLE
    reassembled = chunker.reassemble(head, translated_chunks, tail)
    
    # Step 5: POSTPROCESS — Single-pass flat restore (no nesting)
    output = postprocess_translation(reassembled, manifest)

15.4 Element Extraction

The preprocessor extracts elements in a two-phase order. Phase 1 (density lift) runs first on the raw DOM. Phase 2 (individual extraction) runs on whatever Phase 1 didn't lift, processing elements in order of specificity (most specific first) to handle nesting correctly:

PhasePassElements ExtractedPlaceholder Format
1Density liftEntire container elements (tr, li, dd, dt, p, div) exceeding density threshold⟦BLOCK_001⟧
21<pre translate="no"> blocks⟦PRE_001⟧
22<code translate="no"> tags⟦CODE_001⟧
23Other translate="no" elements⟦SPAN_001⟧
23b<span class="..."> (structural CSS)⟦SPAN_001⟧
24<strong>, <em>, <b>, <i> tags (short content ≤40 chars)⟦STRONG_001⟧, ⟦EM_001⟧
25<a> tags with event handlers⟦LINK_001⟧
26Numeric patterns (memory sizes, percentages, versions)⟦MEM_001⟧, ⟦PCT_001⟧, ⟦VER_001⟧

Placeholder Format

Placeholders use Unicode brackets ( and ) that will never appear in real HTML content:

Phase 2 Nested Element Handling

Within Phase 2, the preprocessor handles arbitrary nesting depth by processing innermost elements first. Note: this nesting only occurs for individually-extracted elements — BLOCK entries from Phase 1 are always flat (they contain original HTML, never placeholders).

Source:
<span translate="no"><code translate="no">tbi-ops-notify</code> Lambda</span>

Pass 1: Extract inner code tag
<span translate="no">⟦CODE_001⟧ Lambda</span>

Pass 2: Extract outer span
⟦SPAN_002⟧

Model sees: ⟦SPAN_002⟧ (one token, no nesting)

Sibling Placeholder Awareness

When the preprocessor extracts elements from a container (e.g., a table cell), earlier passes leave placeholder text in the parent. Later passes must not be confused by these sibling placeholders — a <code translate="no"> tag in the same table cell as an already-extracted element is still a valid extraction target.

Bug fixed (v2.4): The original _is_inside_placeholder check walked up the DOM tree looking for the character in any parent's text. This caused false positives — if a sibling element had been extracted (leaving ⟦CODE_042⟧ in the parent's text), the check incorrectly skipped remaining <code translate="no"> tags in the same container. Those unextracted tags then overwhelmed the model during complex-script translation (Hindi, Urdu). Fix: the check now always returns false — if an element still exists in the DOM tree, it wasn't extracted and is a valid target.

15.4a Density-Based Block Lifting

Phase 1 of the preprocessor inspects the raw DOM before any individual extraction. It identifies container elements where the ratio of protected elements to prose characters is too high — meaning there's too little translatable text to justify sending the element through the model.

Density Formula

density = protected_elements / max(prose_chars, 1)

Where:

The check uses pure DOM element counting — no regex. This makes it deterministic and immune to content patterns that could fool regex-based approaches.

Thresholds by Script Family

Thresholds are per-language via the script_family column in the translation_language_profiles Aurora table:

Script FamilyLanguagesDensity Threshold
latines, fr, de, pt, it0.06
cyrillicru0.05
cjkja, zh0.03
indichi, ur0.03
arabicar0.03

Guard Rails

Aurora Configuration Columns

The translation_language_profiles table stores per-language density configuration:

ColumnTypePurpose
script_familyvarcharSelects density threshold (latin, cyrillic, cjk, indic, arabic)
density_lift_thresholdnumericOverride threshold for this language (NULL = use script_family default)
density_max_proseintegerMax prose chars for lift eligibility (default 300)

Example: Endpoint Table Row

<tr>
  <td><span translate="no">GET</span></td>
  <td><code translate="no">/health</code></td>
  <td>LPO server health check</td>
</tr>

Protected elements: 2 (span + code)
Prose chars: 25 ("GET" + "/health" excluded, "LPO server health check" counted)
Density: 2 / 25 = 0.08

→ CJK (threshold 0.03): LIFTED as ⟦BLOCK_NNN⟧ — density 0.08 > 0.03
→ Latin (threshold 0.06): LIFTED as ⟦BLOCK_NNN⟧ — density 0.08 > 0.06
→ If prose were 40+ chars: density drops to 0.05, Latin would KEEP it

Results

ElastiCache doc (ja): 45 blocks lifted, 156/156 code tags survive translation intact. Zero retries.

KCC doc (ja): 82 blocks lifted, 327/327 code tags survive translation intact. Zero retries.

Block lifting eliminates the failure mode where code-heavy table rows overwhelm the model's attention. The model never sees these rows — it translates a single ⟦BLOCK_NNN⟧ token (which it passes through unchanged) instead of a complex structure with multiple inline placeholders.

15.5 Restoration

After translation, the postprocessor restores placeholders in a single flat pass in reverse index order (high → low) to prevent prefix collisions. Because Phase 1 (density lift) runs BEFORE Phase 2 (individual extraction), BLOCK entries contain original HTML — never nested placeholders.

Why Single-Pass Works

Translated output contains: ⟦BLOCK_003⟧ ... ⟦SPAN_002⟧ ... ⟦CODE_001⟧

Single pass (reverse order):
  Restore ⟦BLOCK_003⟧ → original HTML (no further substitution needed inside)
  Restore ⟦SPAN_002⟧ → <span translate="no">⟦CODE_001⟧ Lambda</span>
  Restore ⟦CODE_001⟧ → <code translate="no">tbi-ops-notify</code>

Final:
  <tr><td>...original row...</td></tr>
  <span translate="no"><code translate="no">tbi-ops-notify</code> Lambda</span>
  <code translate="no">tbi-ops-notify</code>

Perfect reconstruction — no nesting loops, no iteration.

Manifest Structure

The manifest maps each placeholder to its original HTML, enabling exact restoration:

{
  "⟦BLOCK_003⟧": {
    "type": "BLOCK",
    "html": "<tr><td><span translate=\"no\">GET</span></td><td><code translate=\"no\">/health</code></td><td>LPO server health check</td></tr>",
    "index": 3
  },
  "⟦SPAN_002⟧": {
    "type": "SPAN",
    "html": "<span translate=\"no\">⟦CODE_001⟧ Lambda</span>",
    "index": 2
  },
  "⟦CODE_001⟧": {
    "type": "CODE",
    "html": "<code translate=\"no\">tbi-ops-notify</code>",
    "index": 1
  }
}

Note: BLOCK entries always store pristine HTML (no ⟦...⟧ tokens inside). Phase 2 entries like SPAN may contain lower-index placeholders from inner elements extracted in a later pass — this is handled naturally by the reverse-order restore.

Result: The Translation Engine document (290 code tags, complexity 423) now translates with 0 retries across all 11 parts. Previously it failed consistently on Part 8 (config section with 27 code tags). Restoration is now a single deterministic pass with zero edge cases.

15.6 Numeric Pattern Extraction

Pass 5 extracts numeric patterns from the text after HTML element extraction. This protects bare numbers in prose that weren't already inside code or span tags. The model cannot convert, localize, or drop what it never sees.

Why Numeric Extraction Matters

When translating to complex scripts (Arabic, Hindi, Urdu), the model occasionally:

These transformations break technical accuracy. The numeric extraction pass prevents all of them.

Patterns Extracted

Pattern TypeRegexExamplesPlaceholder
Memory sizes\d+(?:\.\d+)?\s?(?:GB|MB|KB|TB)32 GB, 1770 MB, 256 KB⟦MEM_001⟧
Percentages\d+(?:\.\d+)?%98.5%, 62%, 100%⟦PCT_001⟧
Version numbers\d+\.\d+(?:\.\d+)?4.6, 17.7, 2.3.1⟦VER_001⟧

Processing Order

Numeric extraction runs after HTML element extraction (Passes 1-4). This means:

Example: Hindi Translation

Source:
"The Lambda uses 1770 MB of memory and achieves 98.5% uptime."

After Pass 5:
"The Lambda uses ⟦MEM_042⟧ of memory and achieves ⟦PCT_043⟧ uptime."

Model translates prose, placeholders survive intact.

After restoration:
"लैम्ब्डा 1770 MB मेमोरी का उपयोग करता है और 98.5% अपटाइम प्राप्त करता है।"

Technical values preserved exactly — no localization, no conversion.

Result: Translation failures caused by numeric value loss (preserve_memory_size: missing: GB, MB) are now resolved across all 11 languages. Numeric values survive intact regardless of target script.

Placeholder Collision Prevention

The numeric extraction pass includes safeguards to prevent extracting numbers that are part of existing placeholder names (e.g., the "001" in ⟦CODE_001⟧):

Without these guards, the numeric regex would corrupt placeholder names by extracting their index numbers, producing nested placeholders like ⟦CODE___TBN10__⟧ that the model cannot handle.

16. Notification System

The translation engine sends email notifications via the AutoOps notification pipeline (tbi-ops-notify Lambda → SES). Notifications are consolidated across batch jobs and include detailed per-document breakdowns.

16.1 Email Format

Each notification email includes:

Example Notification

Subject: [INFO] Translation Complete: 2 docs × 11 langs — 22/22 pairs SUCCEEDED

Batch Summary:
• Jobs: 2
• Documents: 2
• Languages: 11
• Total Pairs: 22
• Succeeded: 22
• Failed: 0
• Final State: SUCCEEDED
• Total Time: 7m 12s

Documents Translated:
• Trinity-Beast-TBI-Translation-Engine.html
  ✓ Succeeded: es, pt, fr, de, ru, hi, ja, zh, ar, ur, it
• Trinity-Beast-Infrastructure-Overview.html
  ✓ Succeeded: es, pt, fr, de, ru, hi, ja, zh, ar, ur, it

Deployment:
• CloudFront Invalidations: 2
• All translated files deployed to S3

Search Index:
• Rebuilt successfully (all 11 languages)

Partial Success Example

Subject: [WARNING] Translation Complete: 1 doc × 11 langs — 10/11 pairs PARTIAL

Documents Translated:
• Complex-Technical-Guide.html
  ✓ Succeeded: es, pt, fr, de, ru, hi, ja, zh, ar, it
  ✗ Failed: ur

Error Details:
• Complex-Technical-Guide.html → ur: chunk 14 failed validation after 3 retries
  check_tag_counts: expected 27 code tags, found 23

16.2 Batch Consolidation

When multiple translation jobs are submitted together (e.g., translating 5 documents), the notification system consolidates them into a single email:

This prevents notification spam when translating multiple documents — you get one comprehensive email covering the entire batch, not 5 separate emails.

16.3 Document Resolver (v2.5)

When the same document appears in multiple jobs within a batch (e.g., initial run fails Urdu, retry succeeds), the notification resolves duplicate entries into a single final-state view:

Without the resolver, a retry job would show the same document twice — once with the failure and once with the fix — making the notification confusing and the counts misleading.

16.4 Tag Inventory (v2.8)

Every notification includes a Tag Inventory section showing source vs output tag counts per document. This lets you detect at a glance if the model is adding or dropping tags. As of v2.8, the inventory also reports Mermaid diagram counts:

Tag Inventory (source → output):
• Trinity-Beast-Translation-Service.html
  IN:  code:22 pre:5 strong:8 em:2 a:4 br:3 diagrams:1
  OUT: code:22 pre:5 strong:8 em:2 a:4 br:3 diagrams:1

If the model has a bad day and adds a <span> that wasn't in the source, or drops code tags, you'll see the mismatch immediately:

  IN:  code:23 pre:5 strong:8 diagrams:2
  OUT: code:20 pre:4 strong:8 diagrams:1    ← 3 code dropped, 1 diagram lost

Tag counts are logged per-language in Aurora (translation_job_events) with tags_in and tags_out fields. The notification shows the first successful language's counts (source tags are identical across all languages since it's the same source document).

Recipient: All translation notifications go to CoryDeanKalani@CPMP-Site.org via the unified AutoOps notification pipeline. The sender is CPMP Mission <No-Reply@CPMP-Site.org>.

17. Delta Translation (Incremental Updates)

Documents change frequently — a new endpoint, a revised architecture, an updated pricing table. Without delta translation, every edit requires re-translating the entire document across all 11 languages. Delta translation solves this by identifying exactly which sections changed and translating only those, reusing cached translations for everything else.

17.1 Concept

The delta translation system leverages two key properties of the document library:

By comparing the current English document against the version that was last translated, the system identifies which sections changed (by content hash) and only sends those to Bedrock. Unchanged sections are pulled directly from the existing translated document. Typical savings: 70–90% on incremental updates.

17.2 S3 Versioning as Diff Source

The website bucket (trinity-beast-website-east2) has versioning enabled. Every aws s3 cp or s3api put-object creates a new version with a unique VersionId. The delta system uses this to:

No separate manifest storage is required — S3 already has the full history. A lightweight metadata file (docs/delta/{doc}.{lang}.json) tracks which VersionId was last translated for each document-language pair.

17.3 Comment Preservation (Sentinel Pass 0)

For delta translation to work, <!-- TBI-CHUNK --> markers must survive the translation round-trip. Previously, Bedrock silently dropped HTML comments during translation. The sentinel system now includes a Pass 0 that protects all HTML comments:

# Pass 0: Before Bedrock sees the chunk
<!-- TBI-CHUNK -->  →  __TBP0__    (sentinel token)
<!-- Section 5 -->  →  __TBP1__    (sentinel token)

# After translation: sentinels restored
__TBP0__  →  <!-- TBI-CHUNK -->
__TBP1__  →  <!-- Section 5 -->

This is implemented as the first pass in _apply_sentinels() in engine.py, before the existing translate="no" element extraction (Pass 1), paired span sentinels (Pass 2), and numeric protection (Pass 3). Comments are treated as Type A (FULL) sentinels — extracted completely and restored verbatim.

17.4 Hash-Based Section Matching

The algorithm is position-independent — sections are matched by content hash, not by index. This means markers can be added, removed, or repositioned between versions without breaking the delta logic.

Diagram 17.1: Delta Translation Flow

flowchart TD
    A[Fetch Current English from S3] --> B[Split by TBI-CHUNK markers]
    B --> C[Hash each section SHA-256]
    D[Fetch Previous English version] --> E[Split by TBI-CHUNK markers]
    E --> F[Hash each section]
    C --> G{Compare hashes}
    F --> G
    G -->|Match found| H[Pull from existing translation]
    G -->|No match| I[Send to Bedrock]
    H --> J[Reassemble with TBI-CHUNK markers]
    I --> J
    J --> K[Deploy to S3 + Save metadata]

    style A fill:#1e3a5f,stroke:#60a5fa,color:#e0e0e0
    style D fill:#1e3a5f,stroke:#60a5fa,color:#e0e0e0
    style H fill:#064e3b,stroke:#10b981,color:#e0e0e0
    style I fill:#7c2d12,stroke:#f97316,color:#e0e0e0
    style K fill:#1e3a5f,stroke:#60a5fa,color:#e0e0e0
        

Marker repositioning example:

17.5 CLI Commands

Four KCC commands support delta translation and chunk management:

Delta Diff (Analysis Only)

# List available S3 versions
bash scripts/kcc.sh delta-diff Trinity-Beast-API-Reference.html --list-versions

# Compare current vs previous version (auto-detects)
bash scripts/kcc.sh delta-diff Trinity-Beast-API-Reference.html

# Compare against a specific version
bash scripts/kcc.sh delta-diff Trinity-Beast-API-Reference.html --version-id ksYxUBZIUB8Roi2KQYje6ig9R7JesL9z

# Show delta for a specific language
bash scripts/kcc.sh delta-diff Trinity-Beast-API-Reference.html --lang ja

Delta Translate (Incremental Translation — Local CLI)

# Dry run — show what would change without calling Bedrock
bash scripts/kcc.sh delta-translate Trinity-Beast-API-Reference.html es --dry-run

# Translate only changed sections for one language
bash scripts/kcc.sh delta-translate Trinity-Beast-API-Reference.html es

# Translate changed sections for all languages
bash scripts/kcc.sh delta-translate Trinity-Beast-API-Reference.html all

# Force full translation (creates fresh baseline)
bash scripts/kcc.sh delta-translate Trinity-Beast-API-Reference.html all --force

Delta via Remote API (options.delta)

The delta option is also available on POST /admin/translate — the worker skips any language pair where the translated file on S3 is already newer than the source document. No local CLI needed.

# Submit a delta job via the remote API — skips up-to-date pairs automatically
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" -H "Content-Type: application/json" \
  -d '{"docs":["Trinity-Beast-API-Reference.html"],"langs":"all","options":{"delta":true}}' \
  https://api.cpmp-site.org/admin/translate | jq .

Delta Validate (Marker Preservation Check)

# Validate TBI-CHUNK markers survived translation for all delta-enabled docs
bash scripts/kcc.sh delta-validate all all

# Validate a specific doc across all languages
bash scripts/kcc.sh delta-validate Trinity-Beast-API-Reference.html all

# Validate a specific doc + language pair
bash scripts/kcc.sh delta-validate Trinity-Beast-API-Reference.html es

Reports pass/fail per doc×lang pair. Exit code 0 if all pass, 1 if any markers were lost. Run after any translation job to confirm Sentinel Pass 0 is working correctly.

Chunk Sizer (Auto-Placement Suggestions)

# Analyze a doc from S3 and suggest TBI-CHUNK marker placement
bash scripts/kcc.sh chunk-size Trinity-Beast-API-Reference.html

# Analyze a local file
bash scripts/kcc.sh chunk-size /path/to/local/doc.html

Scans the document for <section>, <h2>, <h3>, and .category-section boundaries. Reports current chunk sizes (if markers exist), identifies policy violations, and suggests where to insert markers to stay within the 15KB/18KB/12KB policy. Dense sections (high translate="no" density) automatically target the tighter 12KB limit.

17.6 Bootstrap Path

Existing translated documents do not contain <!-- TBI-CHUNK --> markers (they were stripped before the sentinel fix). The bootstrap sequence is:

  1. First run (full cost): Use --force to translate the entire document. The sentinel fix preserves markers in the output. Delta metadata is saved to S3.
  2. Subsequent runs (delta savings): The tool detects the existing translation has markers, loads metadata to identify the previous English version, and only translates changed sections.

After the bootstrap run, typical savings on incremental updates:

Change TypeTypical SavingsExample
Single section edit85–95%Fix a typo, update one endpoint
New section added70–85%Add a new feature section
Marker repositioned60–75%Split a large section in two
Major rewrite20–40%Restructure half the document

Cost model: At approximately $1.50 per section-language pair, a 9-section document across 11 languages costs ~$148.50 for a full translation. With delta (2 sections changed), the same update costs ~$33 — a 78% reduction.

Quick Reference

ItemValue
Modelqwen.qwen3-235b-a22b-2507-v1:0 (Qwen3-235B — all languages)
Failover Regionsus-east-2us-east-1us-west-2
Target Languages11 internal: es, pt, fr, de, ru, hi, ja, zh, ar, ur, it · 21 supported (TBTS customers)
Worker RuntimePython 3.11 (ECS Fargate persistent service, auto-scaling 1→11)
Deploy/Finalize RuntimeGo (provided.al2023)
Worker Resources2 vCPU / 6 GB per container (Fargate — no timeout ceiling)
Memory (Lambdas)1770 MB
Worker TimeoutNone (runs to completion)
Finalize Timeout180s
Deploy Timeout60s
Max Docs per Request6
Max Active Jobs3
Daily Dollar Cap$600 (24h TTL auto-reset)
Daily Token Cap50M combined tokens (24h TTL auto-reset)
Chunk Size (Latin scripts)6000 chars
Chunk Size (CJK + Russian)4500 chars (ja, zh, ru)
Chunk Size (Indic + Arabic)3000 chars (hi, ur, ar)
Retries per Chunk3
Max Part Size24 KB (safety valve — prevents model output truncation)
MaxConcurrency (per-language)0 (unlimited — all language containers launch simultaneously)
ECR Repositorytbi-translate-worker
SQS Queuetrinity-beast-translation-queue
Step Functiontbi-translation-orchestrator
IAM Role (Worker + Lambdas)tbi-translate-role
IAM Role (Pipe)tbi-translate-pipe-role (DELETED — pipe removed 2026-06-02)
Auto-ScalingDemand-driven: 1 (idle) → N (job submitted, N = language count) → 1 (queue empty)
ECS Servicetbi-translate-worker-service (persistent, always-on)
IAM Role (Step Function)tbi-translate-orchestrator-role
Valkey Keystx:job:{id}, tx:active, tx:history, tx:idempotency:{key}, autoops:bedrock:spend:daily, autoops:bedrock:tokens:input:daily, autoops:bedrock:tokens:output:daily
Aurora Tablestranslation_jobs, translation_job_events
Delta Metadatadocs/delta/{doc}.{lang}.json (S3)
Delta CLIbash scripts/kcc.sh delta-diff, bash scripts/kcc.sh delta-translate, bash scripts/kcc.sh delta-validate, bash scripts/kcc.sh chunk-size
CloudWatch NamespaceTBI/Translation