The Trinity Beast – TBI Translation Engine

1. Why a Custom Translation Engine

The Trinity Beast Infrastructure maintains 40 technical documents translated into 11 languages — over 440 translated files total total. The original approach used AWS Translate batch jobs. It worked for simple prose but failed catastrophically on technical documentation.

1.1 Where AWS Translate Fails

AWS Translate is a neural machine translation service optimized for general-purpose text. Technical documentation with embedded code, diagrams, and brand terminology exposes its fundamental limitations:

Failure Mode	Example	Impact
Translates code blocks	function getName() → función obtenerNombre()	Code no longer executes
Translates variable names	api_key → clave_api	Documentation references break
Breaks Mermaid diagrams	Translates node labels inside mermaid blocks	Diagrams fail to render
Corrupts HTML structure	Merges adjacent elements, drops attributes	Styling and layout break
Transliterates brand names	AutoOps → آٹو آپس (Urdu phonetic)	Brand identity lost, search breaks
Localizes numeric units	32 GB → 32 Go (French)	Technical specs become ambiguous
Drops version numbers	PostgreSQL 17.7 → PostgreSQL	Version-specific guidance lost
Ignores translate attribute	Translates content inside protected zones	Defeats the HTML5 standard mechanism

1.2 The Scale Problem

With 40 documents × 11 languages, every documentation update triggers a translation cascade. Before the custom engine:

Each document required manual post-processing to fix code blocks, diagrams, and brand names
A single document update meant re-translating and re-fixing 11 language versions
No audit trail — local log files only, no provenance tracking
No retry mechanism — a failed translation required starting over from scratch
No cost visibility — Bedrock spend was invisible until the monthly bill
Translation was becoming a full-time job that blocked documentation improvements

1.3 The Solution

A custom Bedrock-powered translation engine that understands the boundary between human language and machine language. The engine uses defense-in-depth across the full pipeline:

Source validation — catches structural defects, encoding issues, and Mermaid syntax errors before burning any Bedrock tokens. Auto-repairs what it can, rejects early with actionable reports when it can't.
Language detection — auto-detects source language via Unicode script analysis and word frequency heuristics (21 languages, no API calls, <10ms). No pivot through English required.
Sentinel preprocessing — replaces protected content with placeholder tokens before the model sees it, then restores them after translation. The model cannot corrupt what it never sees.
Smart splitting — 24 KB max part size prevents model output truncation. Per-language code tag thresholds handle complex-script sensitivity.
Multi-layer validation — every translated chunk is validated against the source for structural integrity, protected term preservation, version number survival, and HTML tag count matching. Failures trigger automatic retries with temperature jitter.
Integrity check with diagram auto-stitch — full-document post-translation repair. Counts Mermaid diagrams, stitches missing ones back from source, repairs broken span/strong/em wrappers.
Event-driven orchestration — a managed Step Functions pipeline handles fan-out (MaxConcurrency 6), retries, deployment, search index rebuilding, and notification. Fire-and-forget from the operator's perspective.

Result: A single POST /admin/translate call translates any document from any supported source language into up to 11 target languages, deploys to S3, invalidates CloudFront, rebuilds the search index, and emails a summary. Source language is auto-detected when not specified — no pivot through English required.

2. Architecture

2.1 Pipeline Flow

The translation service is an event-driven pipeline that decouples submission from execution. The operator submits a job; the system handles everything else asynchronously.

Diagram 2.1: End-to-End Pipeline Architecture (v3.1 — BeastTranslate)

flowchart TB
    subgraph Operator
        A[POST /admin/translate]
    end
    subgraph "LPO Server (Go)"
        B[Validate & Enqueue]
        C[Valkey State]
        D[Aurora Record]
    end
    subgraph "AWS Queue"
        E[SQS Queue]
    end
    subgraph "BeastTranslate — Persistent ECS Service"
        direction TB
        BT[SQS Long-Poll Loop]
        BT --> SC{Scale Check}
        SC -->|Single lang| BT2[Process In-Place]
        SC -->|Multi-lang| BT3[Scale Service to N]
        BT3 --> BT4[N Containers Poll Same Queue]
        BT4 --> BT5[Each Takes 1 Language Message]
        BT2 --> TI["Translation Intelligence (Python)"]
        BT5 --> TI
        subgraph "Translation Intelligence (Python)"
            direction LR
            H0[Source Validation]
            H1[Complexity Analysis]
            H2[Document Preprocessor]
            H3[Sentinel System — 4 Types]
            H4[Bedrock — 3-Region Failover]
            H5[Validator — Hard + Soft Tiers]
            H6[Integrity Check + Auto-Repair]
        end
    end
    subgraph "Deployment (Go Lambdas)"
        direction LR
        I[S3 Write]
        J[CloudFront Invalidation]
        K[Search Index Rebuild]
        L[SES Notification]
    end

    A --> B
    B --> C
    B --> D
    B --> E
    E --> BT
    H6 --> I
    I --> J
    J --> K
    K --> L

    style A fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style B fill:#1e293b,stroke:#334155,color:#e2e8f0
    style C fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style D fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style E fill:#2e1065,stroke:#a78bfa,color:#e2e8f0
    style BT fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style SC fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style BT2 fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style BT3 fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style BT4 fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style BT5 fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style H0 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style H1 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style H2 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style H3 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style H4 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style H5 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style H6 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style I fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style J fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style K fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style L fill:#064e3b,stroke:#10b981,color:#e2e8f0

2.2 Components

Component	Type	Runtime	Purpose
`POST /admin/translate` (+ 8 more)	Admin API	Go	Job submission, monitoring, control
`trinity-beast-translation-queue`	SQS	—	Decouple submission from execution
`tbi-translate-pipe`	EventBridge Pipe	—	SQS → Step Function trigger (no glue Lambda)
`tbi-translation-orchestrator`	Step Functions	—	Fan-out, retry, deploy, finalize orchestration
`tbi-translate-worker`	ECS Fargate Task	Python 3.11	Bedrock translation + sentinel + validation (no timeout ceiling)
`tbi-translate-init`	Lambda	Go	Records execution ARN, transitions queued → running
`tbi-translate-deploy`	Lambda	Go	CloudFront invalidation per document
`tbi-translate-finalize`	Lambda	Go	Search rebuild + SES notification + state transition
`translation_jobs`	Aurora table	—	Permanent job records (28 columns)
`translation_job_events`	Aurora table	—	Granular per-doc/lang audit log

2.3 Why Python (The Only Python in the Fleet)

Every other compute workload in The Trinity Beast Infrastructure is written in Go. The translation worker is the sole exception, and for good reason:

lxml for HTML parsing — Go's HTML parsers are adequate for simple tasks but lack the XPath and tree-manipulation capabilities needed for sentinel preprocessing on complex nested documents.
Battle-tested engine — the translator package was developed and debugged over weeks of production use. Rewriting in Go would re-introduce every bug already fixed (56+ smoke test cases).
Rapid iteration — prompt engineering and validator tuning require fast feedback loops. Python's interpreted nature allows testing changes without compile cycles.
Separation of concerns — the translation engine is a self-contained package with its own config, prompts, validators, and chunker. Keeping it in Python isolates it from the Go service layer.
Container image deployment — ships as a Docker image to ECR, runs as an ECS Fargate task with no timeout ceiling. The same image also supports Lambda invocation for smaller documents.

Convention note: All Lambda functions use 1770 MB memory (multiple of 3). The worker runs as an ECS Fargate task (2 vCPU / 6 GB) with no timeout ceiling — large documents translate to completion regardless of processing time. The worker is also a unified batch orchestrator: during idle periods (every 33 seconds), it polls Bedrock for completed batch inference jobs, processes output JSONL inline, deploys translated docs to S3, and triggers finalize. Deploy and finalize Lambdas use 60s and 180s timeouts respectively.

3. Sentinel Preprocessing System

The sentinel system is the core innovation that makes reliable technical document translation possible. It operates on a simple principle: the model cannot corrupt what it never sees.

Before any chunk is sent to Bedrock, protected content is replaced with placeholder tokens. The model translates the prose around the placeholders. After translation, the placeholders are swapped back to the original content. Validation then confirms everything survived intact.

3.0 Pre-Sentinelization Passes (v5.0)

Before the sentinel system runs, three automatic preparation passes ensure maximum coverage. These run deterministically on every document — no manual markup needed. The design principle: they can't break what isn't there.

Pass	Function	What It Does	Failure Class Eliminated
Brand Term Auto-Wrap	`_auto_wrap_brand_terms()`	Scans for all 57 protected terms from `translation-config.json` and wraps each occurrence in `<span translate="no">`. They become Type A sentinel candidates automatically.	Brand transliteration (e.g., The Trinity Beast → ट्रिनिटी बीस्ट)
Code Tag Protection	`_protect_code_tags()`	Upgrades every bare `<code>` tag to `<code translate="no">`. Pass 1 then lifts them all as Type A sentinels.	Code tag mismatch failures (the single largest failure class before v5.0)
Path Protection	`_fix_paths_absolute()`	Converts relative paths (`assets/`, `images/`, `css/`, `js/`) to absolute URLs before translation. The model passes them through unchanged.	Broken asset references in translated docs

These passes run in handler.py before the document enters the chunker or sentinel system. They are deterministic, idempotent, and add zero cost — they only manipulate the source HTML locally. After these passes, the sentinel system has maximum coverage: every code tag, every brand term, and every path is already protected before sentinelization begins.

3.1 Sentinel Types and Eight Passes

Type A — Full Element Extraction (`TBP{N}`)

Replaces entire translate="no" elements with a single token. The model sees only the placeholder and places it in the natural position for the target language's word order.

Before	After Sentinel Pass
`<span translate="no">CloudFront</span> invalidation`	`__TBP0__ invalidation`
`<code translate="no">api_key</code> parameter`	`__TBP1__ parameter`

Handles arbitrary nesting depth — processes innermost elements first, then sweeps outward until stable.

Type B — Paired Open/Close (`TBO{N}` / `TBC{N}`)

For plain <span> wrappers (no class attribute) containing translatable text. The wrapper tags become sentinels; the text between them is translated normally. Spans with a class attribute are now extracted as Type A FULL sentinels (see Pass 1b below) since the class denotes a structural/decorative CSS hook that must survive intact.

Before	After Sentinel Pass
`<span style="color:#9ece6a">success message</span>`	`__TBO0__success message__TBC0__`

The model translates "success message" while the <span style="..."> wrapper survives intact. Class-bearing spans like <span class="badge">, <span class="tree-label">, and <span class="method-tag"> are handled by Pass 1b as full extractions — the model never sees them.

Type C — Numeric Protection (`TBN{N}`)

Protects bare numbers in prose from the model's tendency to drop, paraphrase, or localize them. Matches integers, decimals, percentages, and number+unit pairs.

Before	After Sentinel Pass	Problem Prevented
`uses 1770 MB of memory`	`uses __TBN0__ of memory`	French translating "MB" → "Mo"
`achieves 98.5% uptime`	`achieves __TBN1__ uptime`	Japanese dropping the decimal
`62% cache hit rate`	`__TBN2__ cache hit rate`	German paraphrasing to words

Type D — Brand Term Protection (`TBT{N}`)

Protects brand terms, product names, and proper nouns that must never be translated or transliterated. Unlike Type A (which requires translate="no" in the source HTML), Type D operates from a centralized configuration list — no source markup needed.

Before	After Sentinel Pass	Problem Prevented
`powered by The Trinity Beast`	`powered by __TBT0__`	Hindi transliterating to ट्रिनिटी बीस्ट
`deployed on CloudFront`	`deployed on __TBT1__`	Arabic transliterating to كلاود فرونت
`Cory Dean Kalani`	`__TBT2__`	Urdu transliterating person names

Protected terms are defined in translation-config.json (57 terms). The sentinel pass matches terms using word-boundary regex for short terms (≤5 chars) and substring matching for longer terms. Restoration is exact — the original term text is re-injected at the sentinel position.

Sentinel Recovery Pass (Post-Restoration)

Complex-script models (Hindi, Urdu, Arabic) occasionally drop sentinel tokens entirely from their output — the token simply doesn't appear in the translated text. This affects both Type A (FULL) and Type D (TERM) sentinels, particularly in token-dense chunks with 20+ sentinels. The recovery pass runs after normal restoration and before validation:

Iterates all TERM entries in the sentinels list
Checks if the term is present in the source but missing from the restored output
Re-injects the original term text at an approximate position (ratio-based paragraph matching)
Falls back to insertion before the last closing tag if position cannot be determined

This eliminates the class of failures where the model acknowledges the sentinel in its "thinking" but omits it from the output — a behavior observed primarily in Indic scripts with token-dense chunks.

3.2 Processing Flow

Diagram 3.1: Sentinel Preprocessing Flow

flowchart TD
    A[Source HTML Chunk] --> B[Pass 1: Extract translate=no elements]
    B --> B1b[Pass 1b: Extract class-bearing spans as FULL sentinels]
    B1b --> B2[Pass 2: Extract all text-only spans as FULL sentinels]
    B2 --> B3[Pass 2b: Wrap text-only links as paired sentinels]
    B3 --> B4[Pass 2c: Extract nested-HTML and event-handler links as FULL]
    B4 --> D[Pass 3: Replace bare numbers with numeric sentinels]
    D --> D2[Pass 4: Replace brand terms with TERM sentinels]
    D2 --> D3[Pass 5: Extract email addresses as FULL sentinels]
    D3 --> D4[Pass 6: Extract bare prose URLs as FULL sentinels]
    D4 --> E[Send to Bedrock with sentinel-aware prompt]
    E --> F[Receive translated chunk with sentinels intact]
    F --> G[Deduplicate any model-doubled paired sentinels]
    G --> H[Restore sentinels high-to-low index order]
    H --> H2[Recovery pass: re-inject any dropped FULL/TERM sentinels]
    H2 --> H3[Repair pass: fix dropped span/strong/em tags]
    H3 --> I[Run validators against source + restored output]
    I -->|PASS| J[Accept chunk]
    I -->|FAIL| K{Retries remaining?}
    K -->|Yes| L[Retry with strict prompt + temperature jitter]
    L --> E
    K -->|No| M[Raise TranslationError]

    style A fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style E fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style J fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style M fill:#450a0a,stroke:#ef4444,color:#e2e8f0

The eight passes execute in strict order — later passes operate on the output of earlier ones. Pass 1b extracts all class-bearing spans as FULL sentinels (eliminating the entire class of span-drop failures for structural/decorative CSS hooks). Pass 2 extracts remaining text-only spans as FULL sentinels. Passes 2b/2c handle anchor tags — text-only links become PAIR sentinels (attributes opaque, inner text translated), while links with nested HTML or JavaScript event handlers are extracted as FULL. Type C (numeric) sentinels protect numbers that appear inside Type B (paired) text. Type D (brand term) sentinels protect terms anywhere in translatable content. Passes 5 and 6 protect email addresses and bare prose URLs from translation/mangling. This provides defense-in-depth across eight layers.

3.3 Restoration and Deduplication

After translation, sentinels are restored in reverse index order (high → low) to prevent prefix collisions (__TBP1__ must not match inside __TBP10__).

A deduplication pass runs before restoration to handle a known model behavior: occasionally the model emits a paired sentinel twice consecutively (a bilingual output instinct). The deduplicator collapses __TBO0__text__TBC0__ __TBO0__text__TBC0__ into a single occurrence.

4. Validator System

Every translated chunk is validated against the source before acceptance. Validators enforce structural integrity and content preservation — if a translation passes all validators, it is guaranteed to be functionally correct (code works, links resolve, diagrams render).

4.1 Validation Checks

Validator	Type	What It Checks	Failure Example
`check_protected_terms`	Hard	Every protected term in source appears in output	"CloudFront" missing from Japanese output
`check_version_numbers`	Hard	All version numbers (X.Y.Z) survive translation	"17.7" dropped from PostgreSQL reference
`check_preserve_patterns`	Hard	URLs, emails, IPs, ARNs, resource IDs, cron expressions, memory sizes	ARN truncated or IP address reformatted
`check_tag_counts`	Hard	HTML tag counts match for structural tags	Extra `<span>` added or `<code>` dropped
`check_translate_no_zones`	Hard	Content inside `translate="no"` zones unchanged	Protected code block content altered

Protected term matching: Short uppercase acronyms (≤4 chars like SQS, ECR, S3) use word-boundary matching to avoid false positives where the acronym appears as a substring (e.g., "ECR" inside "SECRET"). Longer terms use plain substring matching.

Implementation (v2.5): The check_tag_counts and check_translate_no_zones validators use character scanning with exact boundary matching — no regex. We control these tags. We know that a tag starts with < and ends with >. The scanner finds complete opening tags by looking for <tagname followed by a boundary character (>, space, tab, newline, or /), then reads to the closing >. This eliminates false positives from partial regex matches and is immune to edge cases where tag names appear as text content (e.g., documenting translate="no" as literal text inside a code tag).

4.2 Retry Strategy

When validation fails, the engine retries with two progressive adjustments:

Strict prompt activation — adds an explicit warning: "PREVIOUS ATTEMPT FAILED VALIDATION. Be more careful: every protected term and every version number from the input MUST appear unchanged in the output."
Temperature jitter — increments temperature by 0.1 per retry (0.0 → 0.1 → 0.2 → 0.3, capped at 0.5). A deterministic temp=0 retry produces the same erroneous output; temperature jitter lets the model take a different sampling path.

Maximum retries: 3 (configurable). If all attempts fail, a TranslationError is raised with the chunk index, validator detail, and a preview of the problematic chunk.

4.3 Hard vs Soft Failures

Validators are classified into two tiers based on what they protect:

Tier	Tags	Behavior	Rationale
Hard (content-critical)	`<code>`, `<pre>`, `<a>`	Retry → reject on failure	Missing code blocks, broken links, or lost pre-formatted content means the translation is functionally broken
Soft (decorative/structural)	`<span>`, `<strong>`, `<em>`, `<br>`	Log warning, pass through	Missing styling wrappers don't break functionality — the post-translation integrity check repairs them

This tiered approach eliminates the failure mode where a correctly-translated document is rejected because the model dropped a single decorative <span> wrapper during RTL reordering. The content is correct — only the styling wrapper is missing — and the integrity check restores it automatically.

The ValidationReport aggregates all results and exposes:

.passed — True if zero hard failures
.hard_failures — list of blocking issues (content-critical tags)
.soft_failures — list of warnings (decorative tags — repaired post-translation)
.summary() — human-readable status string

4.4 Post-Translation Integrity Check

After translation completes and chunks are reassembled, a full-document integrity check runs before the S3 write. This is the defense-in-depth layer — it repairs structural drift that the per-chunk validator intentionally allows through (soft failures).

Repair Capabilities

Issue	Detection	Repair Action
`</br>` injection	String scan for invalid closing br tags	Strip all occurrences (never valid HTML)
`<br>` inside Mermaid blocks	Regex scan within `<pre class="mermaid">`	Remove (breaks Mermaid syntax)
Mermaid content corruption	Byte-for-byte comparison with source	Flag as warning (cannot auto-repair content changes)
Missing `translate="no"` span wrappers	Compare source protected elements to output	Re-wrap bare content with original element tags
Missing `<strong>`/`<em>` wrappers	Same pattern as span recovery	Re-wrap bare content

The integrity check only repairs translate="no" elements (where content is byte-for-byte identical between source and output). For translated content that lost its wrapper, the check logs the discrepancy but cannot reliably re-wrap (the content has been translated — matching it to the source wrapper requires semantic understanding).

Design principle: If the translated content is present and correct but the HTML structure is degraded, repair it. Only flag as unrecoverable if content is actually missing or corrupted. The customer sees a clean translation — the repairs happen invisibly.

4.5 Source Document Validation (v2.8)

Before any translation work begins, the source document passes through a validation gate. This catches defects that would cause translation failures or produce broken output — rejecting early saves Bedrock tokens and prevents corrupted translations from reaching S3.

Defect Categories

Category	What It Catches	Auto-Repairable?
STRUCTURAL	Unclosed tags, malformed HTML, nesting violations	Yes (up to 5 unclosed tags)
MERMAID	Empty diagram blocks, missing type declaration, mismatched brackets	No — reject with location
ENCODING	BOM markers, null bytes, mixed encodings	Yes (strip BOM/nulls)
SIZE	Document exceeds 500 KB, excessive nesting depth (>30 levels)	No — reject with size info
CONFLICT	`translate="no"` on root element (nothing to translate)	No — reject immediately

Validation Flow

Size check — reject if > 500 KB (chunking becomes unreliable at this size)
Encoding check — detect and strip BOM markers, null bytes; flag mixed encodings
Structural HTML check — scan for unclosed tags; auto-repair up to 5 by appending closing tags at the correct nesting level
Mermaid syntax check — validate every <pre class="mermaid"> block has a valid diagram type, balanced brackets, and non-empty content
Conflict check — reject if the root <body> or <html> element has translate="no"

Rejection vs Repair

The validator follows a strict philosophy: try to fix it silently, reject early if you can't. Repairable issues (unclosed tags, BOM markers) are fixed in-place — the customer never knows. Unrecoverable issues produce an actionable defect report with the exact location, what's wrong, and how to fix it.

ValidationResult:
  valid: false
  rejection_reason: "2 unrecoverable defects found"
  defects:
    - severity: error
      category: MERMAID
      location: "Section 5, line 342"
      description: "Empty Mermaid block — no diagram content"
      suggestion: "Add diagram content or remove the empty <pre class='mermaid'> block"
    - severity: error
      category: SIZE
      location: "Document root"
      description: "Document is 612 KB (limit: 500 KB)"
      suggestion: "Split into multiple documents or remove large embedded assets"

Cost savings: A rejected document costs zero Bedrock tokens. Without source validation, a broken document would fail during translation (after burning tokens on partial chunks), produce a corrupted output, and require manual investigation. Source validation catches these cases in <10ms with zero API calls.

4.6 Diagram Integrity (v2.8)

Mermaid diagrams are code — they must survive translation byte-for-byte. The integrity check (section 4.4) now includes dedicated diagram verification with automatic recovery.

Detection

The integrity check counts Mermaid blocks in the source (<pre class="mermaid">) and compares against the translated output. If any diagrams are missing from the output, the auto-stitch mechanism activates.

Auto-Stitch Recovery

When a diagram is missing from the translated output:

Identify which source diagram is absent (by content matching)
Extract the full <div class="diagram-wrap"> block from source (includes label + pre)
Locate the correct insertion point in the output (same section, same relative position)
Inject the source diagram block verbatim — diagrams don't need translation

The stitched diagram is the English version, which is functionally correct — Mermaid syntax is language-independent. The surrounding prose is already translated, so the reader gets translated explanations with a working diagram.

Tag Inventory Integration

The _count_tags function now reports diagram count alongside other structural tags:

Tag Inventory (source → output):
• Trinity-Beast-Performance-Report.html
  IN:  code:75 pre:8 strong:12 em:3 a:6 br:20 diagrams:4
  OUT: code:75 pre:8 strong:12 em:3 a:6 br:20 diagrams:4

If a diagram is lost during translation and auto-stitched back, the final count still matches — the stitch happens before the tag inventory is calculated. A mismatch in the diagrams count after stitching indicates a structural issue that needs manual review.

Result: The Performance Report (75 KB, 4 Mermaid diagrams, 18 sections) translates to French with all 4 diagrams intact — 3 survived translation naturally, 1 was auto-stitched from source. The reader sees no difference.

4.7 Post-Translation Repair Pipeline (v5.0)

Defense-in-depth: even with perfect preprocessing, models occasionally drop or mangle structural elements. The repair pipeline runs after translation and before final validation, catching anything that slipped through.

Code Tag Repair (`_repair_code_tags`)

Multi-pass repair (up to 6 iterations) that detects dropped <code> wrappers by comparing the translated output against the source document. For each code-tagged value in the source, it searches the output for the bare content and re-wraps it with the original tag. Converges when no more repairs are possible.

Input (broken)	Output (repaired)
`api_key parameter requires...`	`<code translate="no">api_key</code> parameter requires...`
`the timestamptz column stores...`	`the <code translate="no">timestamptz</code> column stores...`

The repair function matches content from the source's code tags against the output using exact string matching. It handles both self-closing patterns and open/close pairs. Each pass may reveal new repair opportunities (nested cases), hence the multi-pass design with a convergence check.

lxml Syntax Validation (`_tidy_validate_and_repair`)

Structural HTML repair using lxml's robust parser. Catches and fixes issues that the model introduces in the HTML structure itself — unclosed tags, nesting violations, mismatched attributes, and malformed output. This is the final safety net before the translated chunk is accepted.

Issue Detected	Action
Unclosed `<div>` or `<span>`	Auto-close at the correct nesting level
Orphaned closing tags	Remove (no matching opener)
Invalid nesting (e.g., `<p>` inside `<p>`)	Restructure to valid hierarchy
Malformed attributes	Normalize quotes and spacing

Uses lxml.html.fragment_fromstring() for parsing — no external binary needed (lxml is in the container's requirements.txt). Falls back to BeautifulSoup if lxml encounters a parse failure it cannot recover from.

Output Integrity Check (`_check_output_integrity`)

Final gate before a translated chunk is accepted. Compares structural tag counts between source and output — every <code>, <pre>, <strong>, <em>, <a>, <br>, and Mermaid diagram must have matching counts. If counts diverge beyond tolerance, the _recover_missing_wrappers() function attempts targeted restoration before rejecting the chunk.

Design principle: Preprocessing is the fortress — strip everything non-prose before the model sees it. Post-processing is defense-in-depth — repair anything that slips through. The goal is zero repairs needed because prep was thorough, but the safety net is always active.

5. BeastTranslate — Persistent Worker Architecture

Translation execution is handled by BeastTranslate — a persistent ECS Fargate service (tbi-translate-worker-service) that serves as the unified translation orchestrator. It continuously polls the SQS translation queue for realtime jobs AND polls Bedrock for completed batch inference jobs every 33 seconds during idle. Unlike the previous Step Function → RunTask model (v3.0), BeastTranslate is always-on — no cold starts, no orchestration overhead, instant job pickup, and zero Lambda invocations for the batch completion path.

5.1 BeastTranslate Service Design (v3.1)

Diagram 5.1: BeastTranslate Persistent Worker Architecture

flowchart TD
    SQS[SQS Translation Queue] --> LP[Long-Poll Loop - 20s wait]
    LP -->|Message received| DM[Deserialize Job Message]
    DM --> EM{Execution Mode?}
    EM -->|Express - realtime| RT[Process All Languages Sequentially]
    EM -->|Batch - high volume| BT[Scale Service to N Containers]
    BT --> PAR[N Containers Poll Same Queue]
    PAR --> EACH[Each Takes 1 Language Message]
    EACH --> PROC[Translate All Docs for That Language]
    RT --> PROC
    PROC --> CB[POST /admin/translate/callback]
    CB --> DONE{More Messages?}
    DONE -->|Yes| LP
    DONE -->|No - queue empty| IDLE[IDLE - Resume Polling]
    IDLE --> LP

    subgraph GS[Graceful Shutdown]
        SIG[SIGTERM Received] --> FIN[Finish Current Doc]
        FIN --> CLEANEX[Clean Exit]
    end

    subgraph VH[Visibility Heartbeat]
        HB[Every 5 min] --> EXTVIS[Extend SQS Visibility]
        EXTVIS --> PREV[Prevent Re-Delivery]
    end

    style SQS fill:#2e1065,stroke:#a78bfa,color:#e2e8f0
    style LP fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style DM fill:#7c2d12,stroke:#fb923c,color:#e2e8f0
    style EM fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style RT fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style BT fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style PAR fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style EACH fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style PROC fill:#92400e,stroke:#fbbf24,color:#e2e8f0
    style CB fill:#064e3b,stroke:#10b981,color:#e2e8f0
    style DONE fill:#1e293b,stroke:#334155,color:#e2e8f0
    style IDLE fill:#1e293b,stroke:#334155,color:#e2e8f0
    style SIG fill:#4c1d95,stroke:#c084fc,color:#e2e8f0
    style FIN fill:#4c1d95,stroke:#c084fc,color:#e2e8f0
    style CLEANEX fill:#4c1d95,stroke:#c084fc,color:#e2e8f0
    style HB fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style EXTVIS fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    style PREV fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0

Service Specification

Property	Value
Service Name	`tbi-translate-worker-service`
Cluster Node	`BeastTranslate`
Image	`211998422884.dkr.ecr.us-east-2.amazonaws.com/tbi-translate-worker:latest`
Resources	2 vCPU / 6 GB (per container)
Default Desired Count	1
Max Scale	12 (one per supported language)
Runtime	Python 3.11
Entry Point	`task_runner.py`
IAM Role	`tbi-translate-role`
Queue	`trinity-beast-translation-queue`
Log Group	`/ecs/tbi-translate-worker`
Timeout	None — runs to completion

How It Works: The Polling Loop

BeastTranslate runs a continuous dual-mode polling loop. When no jobs exist, it idles — consuming negligible CPU while maintaining a warm connection to SQS. The moment a translation job is submitted, the container picks it up within seconds (no cold start, no orchestration delay). During idle periods, it also checks for completed Bedrock batch inference jobs every 33 seconds.

SQS Poll: receive_message(WaitTimeSeconds=20) — blocks for up to 20 seconds waiting for a message
Receive: Deserialize job envelope (job_id, docs[], language, options)
Process: For each document in the array, run the full translation pipeline (sentinel → Bedrock → validate → integrity check)
Heartbeat: Every 5 minutes, extend the SQS message visibility timeout to prevent re-delivery during long-running documents
Callback: On completion, POST results back to the LPO server via /admin/translate/callback
Delete: Remove the message from SQS after successful processing
Batch Check (idle only): Every 33 seconds when no SQS messages arrive, query Aurora for in-progress batch jobs → call GetModelInvocationJob → on completion, read output JSONL from S3, deploy translated docs, invoke finalize
Loop: Return to step 1

Auto-Scaling: Demand-Driven Container Management

BeastTranslate scales itself automatically — no manual intervention required. The system determines container count from the job's language list at submission time, then scales back to 1 when all work completes.

How It Determines Scale

When a translation job is submitted via POST /admin/translate, the LPO server inspects the langs array:

Count languages: desired = len(langs) — one container per language for full parallelism
Cap at 11: Maximum supported target languages (the internal language set minus English)
Scale up: ecs:UpdateService(desiredCount=N) fires immediately after the job is enqueued to SQS
Containers ready: New containers start from ECR image cache in ~30 seconds

How It Determines Scale-Down

When a worker completes a job, the finalize step checks for remaining work before scaling down:

Query active queue: GET /admin/translate/queue — checks active and queued job counts
If queue empty: ecs:UpdateService(desiredCount=1) — scale back to steady-state
If jobs remain: Skip scale-down — let other workers continue processing

This handles back-to-back batch submissions (e.g., 6 docs + 6 docs) correctly — the first batch to finish won't tear down containers while the second is still running.

Scenario	Containers	Trigger
Idle (no jobs)	1	Default steady-state — near-zero CPU
Small job (1–3 languages)	1–3	Auto-scaled at submission: `desired = len(langs)`
Full library (11 languages)	11	Auto-scaled at submission — full parallelism
Post-batch (all jobs done)	1	Auto-scaled down by finalize step when queue is empty

Zero operator involvement. Submit the job, walk away. The infrastructure sizes itself to the workload, processes at maximum parallelism, then shrinks back to 1 container (~$0.05/hour at idle). KCC manual commands (translate-scale N) still work as an override but are no longer needed for normal operation.

Graceful Shutdown (SIGTERM)

When ECS sends SIGTERM (scale-down or deployment), the container:

Stops polling for new messages immediately
Finishes translating the current document (never abandons mid-document)
POSTs partial progress back via callback
Exits cleanly with code 0

This ensures no work is lost during scale-down events. A partially-completed job resumes from where it left off when the next container picks up remaining messages.

Backward Compatibility

The worker detects its execution mode via environment variables:

If TRANSLATE_JOB_ID is set → single-run mode (Step Function invocation, legacy path)
If SQS_QUEUE_URL is set and TRANSLATE_JOB_ID is NOT set → persistent polling mode (BeastTranslate unified orchestrator)

This means the same container image works both as a persistent service (normal operation — handling realtime SQS jobs AND batch inference completion) and as a one-shot task (Step Function fallback).

Diagram 5.2: BeastTranslate Auto-Scaling — Multi-Language Job Flow

sequenceDiagram
    participant Admin as Admin / Customer
    participant LPO as LPO Server
    participant ECS as ECS Service
    participant SQS as SQS Queue
    participant BT1 as BeastTranslate 1
    participant BT2 as BeastTranslate 2-11

    Admin->>LPO: POST /admin/translate (3 docs x 11 langs)
    LPO->>SQS: Enqueue job message
    LPO->>ECS: UpdateService(desiredCount=11)
    LPO->>Admin: 200 OK (job_id, state: queued)

    Note over ECS: Spins up 10 new containers (~30s)
    Note over BT1: Already polling (always-on)
    BT1->>SQS: ReceiveMessage
    SQS-->>BT1: Job envelope (all docs, all langs)

    Note over BT1: Processes lang[0] sequentially
    BT2->>SQS: ReceiveMessage (competing consumers)
    Note over BT2: Each takes next available work

    par 11 containers process in parallel
        BT1->>BT1: es - all 3 docs
        BT2->>BT2: fr, de, ru, ja, zh, ar, hi, ur, pt, it
    end

    BT1->>LPO: Finalize (state, CloudFront, search, notify)
    Note over BT1: Check queue - jobs remain? Skip scale-down
    BT2->>LPO: Last worker finalizes
    Note over BT2: Check queue - empty!
    BT2->>ECS: UpdateService(desiredCount=1)
    Note over ECS: 10 containers drain gracefully
    LPO->>Admin: SNS notification email

5.2 Error Handling and Recovery

Failure Mode	Handling	Job State
Single language fails after 3 retries	Catch → RecordLangFailure pass state, continue other langs	`partial`
All languages for a doc fail	Deploy Lambda receives empty succeeded list, skips invalidation	`partial`
Worker timeout (no response)	ECS task runs to completion — no timeout ceiling. Step Function waits via `ecs:runTask.sync`	`running`
Step Function execution exception	Finalize still runs via catch-all; job marked `failed`	`failed`
Operator cancels mid-flight	`StopExecution` API call; job marked `cancelled`	`cancelled`
Step Function fails before Finalize	Self-healing sweeper detects orphaned job via execution ARN, marks as `failed`	`failed`

Per-lang independence: Failure of one (doc, lang) pair never aborts work on the other 10 languages. This is enforced by the Step Function's Catch on the inner Map iterator — errors are captured as data, not propagated as exceptions.

5.3 EventBridge Pipe (Disabled — Legacy Path)

⚠️ Status: STOPPED (June 2, 2026). The tbi-translate-pipe EventBridge Pipe is disabled. BeastTranslate now polls the SQS queue directly — there is no orchestration layer between the queue and the worker. The Pipe and Step Function are retained as a legacy fallback but are not active for any translations. As of v4.0 (June 4, 2026), the worker also handles batch inference completion polling (every 33s during idle), eliminating the tbi-translate-batch-poll and tbi-translate-batch-process Lambdas entirely.

The legacy tbi-translate-pipe connected SQS to the Step Function without a glue Lambda:

Source: trinity-beast-translation-queue
Filter: None (all messages trigger)
Target: tbi-translation-orchestrator
Input transformation: InputTemplate extracts body fields using <$.body.field> syntax (implicit JSON parsing of SQS body)
IAM: tbi-translate-pipe-role with sqs:ReceiveMessage, sqs:DeleteMessage, states:StartExecution

Why disabled: With BeastTranslate polling the same queue, both consumers would compete for messages — causing double-processing. The persistent worker replaced the Pipe → Step Function → RunTask chain for all Express (real-time) translations. The Step Function path remains available for batch inference (Standard tier) where Bedrock's batch API requires a different execution model.

5.4 Self-Healing Sweeper

The sweeper runs automatically on every GET /admin/translate/health call (piggybacked) and is also available as a dedicated POST /admin/translate/sweep endpoint.

It scans all jobs in tx:active (the Valkey SET of active job IDs). For each job older than 15 minutes in queued or running state:

Checks the Step Function execution status via the recorded ARN
If FAILED, TIMED_OUT, or ABORTED → marks job as failed, removes from tx:active, updates Aurora with reason
If no execution ARN recorded (pipe never triggered) → marks as failed
If Step Function is still RUNNING → leaves it alone

All sweep actions are logged to translation_job_events for audit trail.

Result: This eliminates the stuck queue problem permanently — no manual cleanup needed. Jobs that silently fail are automatically detected and marked, keeping the active set accurate and the queue healthy.

5.5 Job Phase Transitions

The job state now reflects the exact phase of execution:

Phase	Meaning
`queued`	Submitted to SQS, waiting for BeastTranslate to pick up the message (typically <20 seconds)
`running`	BeastTranslate received the message, worker translating documents
`deploying`	All translations complete, deploy Lambda creating CloudFront invalidations
`finalizing`	Deploy complete, finalize Lambda rebuilding search index and writing final state
`succeeded` / `partial` / `failed`	Terminal states — all sub-tasks complete, email notification sent

This gives real-time visibility into exactly where a job is in the pipeline.

6. Admin API (9 Endpoints)

All endpoints require the X-Admin-Key header. They are served by the LPO server (Go) alongside the existing admin routes.

6.1 Submit Translation Job

`POST /admin/translate`

Submits a new translation job. Validates inputs, checks cost limits, creates job state in Valkey (synchronous) and Aurora (async goroutine), enqueues to SQS.

// Request
POST /admin/translate
X-Admin-Key: tbcc-admin-...
X-Idempotency-Key: my-unique-key (optional)
Content-Type: application/json

{
  "docs": ["Trinity-Beast-API-Reference.html", "Trinity-Beast-Architecture-Guide.html"],
  "langs": "all",
  "options": {
    "force": false,
    "delta": false,
    "skip_search_rebuild": false,
    "skip_validation": false
  }
}

// Response 200
{
  "status": "✅ [LPO] [us-east-2] [BeastMain] [/admin/translate] [200]",
  "status_code": 200,
  "endpoint": "/admin/translate",
  "cluster_node": "BeastMain",
  "region": "us-east-2",
  "language": "en",
  "timestamp": "2026-05-16T16:42:00Z",
  "data": {
    "job_id": "1747407720-a3f8b2c1d4e5",
    "state": "queued",
    "submitted_at": "2026-05-16T16:42:00Z"
  },
  "error": ""
}

Validation rules:

docs — required, 1-6 entries, each must be a valid filename in S3
langs — "all" (expands to all 11) or an array of 1-11 valid language codes
options.force — bypass known-failure guard and difficulty rejection
options.delta — skip pairs where the translated file is already newer than the source (saves up to 90% on re-translation)
Daily dollar spend must be under $600 (autoops:bedrock:spend:daily)
Daily token usage must be under 50M combined tokens (autoops:bedrock:tokens:input:daily + autoops:bedrock:tokens:output:daily)
Active jobs must be under 3 (additional jobs queue in SQS)

6.2 Monitoring Endpoints

`GET /admin/translate/status/{job_id}`

Returns the full job state. Aurora is the primary source — state, timestamps, docs, langs, cost, and Step Function ARN are read from translation_jobs. Real-time per-doc/lang progress is overlaid from Valkey (written per-pair by the worker, too frequent for Aurora writes). If Aurora doesn't have the job yet (async insert still pending), falls back to Valkey.

`GET /admin/translate/queue`

Lists all pending and active jobs (state in queued or running).

`GET /admin/translate/history`

Returns the last 50 completed jobs from translation_jobs in Aurora, ordered by submission date descending. Includes state, docs, succeeded/failed pair counts, cost, and reason. Falls back to the Valkey tx:history list if Aurora is unavailable.

`GET /admin/translate/health`

System health overview:

{
  "status": "✅ [LPO] [us-east-2] [BeastMain] [/admin/translate/health] [200]",
  "status_code": 200,
  "endpoint": "/admin/translate/health",
  "cluster_node": "BeastMain",
  "region": "us-east-2",
  "language": "en",
  "timestamp": "2026-05-16T17:30:00Z",
  "data": {
    "queue_depth": 0,
    "active_jobs": 1,
    "last_completed_at": "2026-05-16T17:30:00Z",
    "last_state": "succeeded",
    "daily_spend_usd": "12.40",
    "daily_spend_limit_usd": "600.00",
    "daily_input_tokens": 284150,
    "daily_output_tokens": 312480,
    "daily_token_limit": 50000000,
    "swept_jobs": 0
  },
  "error": ""
}

6.3 Control Endpoints

`POST /admin/translate/cancel/{job_id}`

Stops the Step Function execution via StopExecution API. Marks job as cancelled. Returns 409 if already in a terminal state.

`POST /admin/translate/retry-failed/{job_id}`

Creates a new job from the failed (doc, lang) pairs of a completed-with-partial job. Returns 409 if the original is still running.

`POST /admin/translate/sweep`

Manually triggers the self-healing sweeper. Idempotent — safe to call repeatedly.

// Response 200
{
  "status": "✅ [LPO] [us-east-2] [BeastMain] [/admin/translate/sweep] [200]",
  "status_code": 200,
  "endpoint": "/admin/translate/sweep",
  "cluster_node": "BeastMain",
  "region": "us-east-2",
  "language": "en",
  "timestamp": "2026-05-16T18:00:00Z",
  "data": {
    "swept": 2,
    "checked": 5,
    "results": [
      {
        "job_id": "1747407720-a3f8b2c1d4e5",
        "prior_state": "running",
        "submitted_at": "2026-05-16T16:42:00Z",
        "sfn_status": "FAILED",
        "action": "marked_failed"
      }
    ]
  },
  "error": ""
}

6.4 Worker Callback Endpoints

These endpoints are called by the worker task and finalize Lambdas to update Aurora without needing direct database access (worker and Lambdas are outside the VPC).

`POST /admin/translate/update/{job_id}`

Updates job state, progress, cost, and timing fields. Called by worker task after each (doc, lang) translation and by finalize Lambda on completion.

`POST /admin/translate/event/{job_id}`

Records a granular event in the translation_job_events table. Used for audit trail — each doc/lang start, success, failure, retry is logged as a separate event.

Fire-and-forget pattern: Both callback endpoints always return 200 regardless of Aurora write outcome. The translation pipeline must never fail because observability data couldn't be written. Errors are logged but never propagated.

7. Aurora Observability — Source of Truth

Aurora is the authoritative record for all translation job state. Valkey serves one specific role: real-time per-pair progress updates during active execution (written too frequently for Aurora). For everything else — job state, history, cost, audit trail — Aurora is read first.

Design principle: Valkey is the price cache, search indexes, and real-time counters. It is not a job ledger. Aurora is the ledger. When you need to know what was translated, when, at what cost, and with what result — query Aurora.

7.1 translation_jobs Table

One row per job submission. 28 columns covering the full lifecycle. This table is the ground truth for gap analysis, cost reporting, and audit:

Column Group	Fields	Purpose
Identity	`id`, `job_id`, `idempotency_key`	Unique identification and deduplication
State	`state`, `submitted_at`, `started_at`, `completed_at`	Lifecycle tracking — authoritative terminal state
Input	`docs` (JSONB), `langs` (JSONB), `options` (JSONB)	What was requested
Progress	`total_pairs`, `succeeded_pairs`, `failed_pairs`, `progress` (JSONB)	Per-doc/lang status map
Cost	`bedrock_cost_usd`, `bedrock_invocations`	Spend tracking per job
Execution	`step_function_arn`, `errors` (JSONB), `elapsed_seconds`	Traceability and debugging
Deployment	`cloudfront_invalidation_ids`, `search_index_rebuilt`, `notification_sent`	Post-translation actions
Lineage	`retry_of`, `reason`	Retry chain and submission reason
Metadata	`submitted_by`, `created_at`, `updated_at`	Audit trail

Gap analysis query: To find which documents have never been translated, query SELECT DISTINCT jsonb_array_elements_text(docs) FROM translation_jobs ORDER BY 1 and compare against the S3 document list. Aurora is the only reliable source for this — Valkey keys expire and don't persist across cache flushes.

7.2 translation_job_events Table

Granular audit log — one row per significant event in a job's lifecycle. Used by the retry-failed handler as the authoritative source of which (doc, lang) pairs failed:

Column	Type	Example Values
`job_id`	VARCHAR	`1747407720-a3f8b2c1d4e5`
`event_type`	VARCHAR	`lang_started`, `lang_succeeded`, `lang_failed`, `deploy_started`, `finalize_complete`
`doc`	VARCHAR	`Trinity-Beast-API-Reference.html`
`lang`	VARCHAR	`ja`, `ar`, `es`
`detail`	JSONB	Cost, chunk count, error message, validator report
`created_at`	TIMESTAMP	Event timestamp

7.3 Read/Write Strategy

The translation system uses a deliberate split between Aurora and Valkey based on access pattern:

Data	Primary Store	Reason
Job state (queued/running/succeeded/failed)	Aurora	Authoritative terminal state — never expires, queryable, auditable
Job history (last 50 completed)	Aurora	Permanent record — survives cache flushes, supports gap analysis
Per-pair progress (es: succeeded, ja: running…)	Valkey	Written per-pair during execution — too frequent for Aurora writes, only needed during active polling
Daily spend counter	Valkey	Needs atomic INCRBYFLOAT and 24h TTL auto-reset — Aurora is wrong tool for this
Active job set	Valkey	Fast set membership check on every submit — Aurora query would add latency to the hot path

Write path

Submit handler: writes to Valkey synchronously (fast, needed immediately for active job tracking), writes to Aurora in a go func() goroutine (non-blocking — API response returns without waiting)
Update/Event handlers: always return HTTP 200 regardless of Aurora write outcome — errors are logged but never propagated. The translation pipeline must never fail because a monitoring write was slow.
Finalize Lambda: calls POST /admin/translate/update/{job_id} with terminal state — Aurora is updated, Valkey is updated, job removed from active set.

Read path

Status endpoint: reads Aurora first (authoritative state, timestamps, cost). Overlays Valkey progress (real-time per-pair map). Falls back to Valkey if Aurora insert is still pending (race window on submit).
History endpoint: reads Aurora exclusively — last 50 jobs ordered by submission date. Valkey fallback only if Aurora is unavailable.
Retry-failed handler: queries translation_job_events for lang_failed events — Aurora is the only reliable source for which pairs failed.

Do not rely on Valkey for job state. Valkey keys have no TTL on job hashes and can be flushed, evicted under memory pressure, or simply stale if the finalize Lambda's update call was lost. Aurora is the record of what happened. Valkey is the window into what is happening right now.

8. Cost Protection

The translation engine calls Bedrock for every chunk of every document in every language, using the customer's chosen agent. Without guardrails, a single typo in a batch submission could trigger hundreds of expensive API calls.

8.1 Three Protection Layers

Layer	Where	Limit	Behavior on Breach
Per-request limits	Admin API (submit handler)	Max 6 docs, max 12 langs, max 3 active jobs	400 Bad Request (docs/langs) or queue in SQS (active jobs)
Daily dollar cap	Admin API (submit handler)	$600/day (`autoops:bedrock:spend:daily`)	429 Too Many Requests until counter expires
Daily token cap	Admin API (submit handler)	50M combined tokens/day (`autoops:bedrock:tokens:input:daily` + `autoops:bedrock:tokens:output:daily`)	429 Too Many Requests until counters expire
Per-invocation tracking	Worker task	Increments after every Bedrock call	Source of truth for daily counters

8.2 Spend Tracking

Two parallel counters track daily usage — a dollar cap and a token cap. Both live in Valkey with 24-hour TTL auto-reset and are checked on every job submission.

Dollar Cap (`autoops:bedrock:spend:daily`)

Type: STRING (numeric dollar value as string)
Updated by: INCRBYFLOAT after every Bedrock invocation in the worker task
Reset: 24-hour TTL auto-reset — the worker sets EXPIRE autoops:bedrock:spend:daily 86400 after each increment
Limit: $600/day — checked by submit handler before admitting new jobs

Why $600? A full batch translation of the entire 40-document library × 11 languages costs approximately $726 in raw Bedrock spend at ~$1.65 per doc-language pair (Sonnet 4.6) — but in practice the library is never re-translated all at once. Typical batches are 3 or 6 documents (per the Trinity Beast multiples-of-3 convention) and run well under $200. The $600 cap is a daily safety guardrail with comfortable headroom for several batches plus normal AutoOps overhead (threat analysis, digests, support) in the same 24-hour window.

Token Cap (`autoops:bedrock:tokens:input:daily` + `autoops:bedrock:tokens:output:daily`)

Type: STRING (integer token count)
Updated by: INCRBY after every Bedrock invocation — separate keys for input and output tokens
Reset: 24-hour TTL auto-reset — same pattern as dollar cap
Limit: 50M combined tokens/day — model-agnostic secondary guard
Purpose: Provides a predictable ceiling that doesn't change when model pricing changes. At Sonnet 4.6 rates, 50M tokens ≈ $750 — safely above the $600 dollar cap, so the dollar cap fires first under normal conditions. The token cap catches edge cases where pricing changes make the dollar cap insufficient.

Kill switch: Setting autoops:bedrock:kill = "1" in Valkey causes both the submit endpoint and the worker task to refuse all operations. Use this for emergency cost containment.

Pricing formula:

Bedrock cost — token-based: (input_tokens × input_rate) + (output_tokens × output_rate) per language pair
Infrastructure markup (9%) — covers ECS Fargate compute, S3 storage, CloudFront invalidation, SQS queuing, and Step Function orchestration
Service fee (30%) — applied to the combined cost (Bedrock + infrastructure)
Per-document floor ($3.00) — each translated document is worth at least $3.00 regardless of calculated cost
Total price = max((Bedrock cost + infra cost) × 1.30, $3.00 × translated documents)

The engine is completely agent-agnostic. Any model accessible through Amazon Bedrock's invoke_model API can serve as a translation agent — whether it uses the Anthropic Messages format or the OpenAI-compatible format. The engine auto-detects the provider and constructs the appropriate request/response envelope. We selected these six agents based on three criteria: (1) quality on multilingual document translation, (2) availability across our failover regions (us-east-2, us-east-1, us-west-2), and (3) distinct cost/quality tradeoffs that let customers choose the right agent for their workload.

Token rates (stored in Aurora translation_parameters, cached in Valkey):

Agent	Tier	Input/1M	Output/1M	Speed	Regions
Qwen3 235B	💰 Value	$0.22	$0.88	0.6×	east-2, west-2
Mistral Large 3	💰 Value	$0.50	$1.50	0.6×	east-2, east-1, west-2
DeepSeek V3	💰 Value	$0.58	$1.68	0.7×	east-2, west-2
Claude Haiku 3.5	⚡ Standard	$0.80	$4.00	0.5×	All (cross-region)
Claude Sonnet 4.6	🏆 Premium	$3.00	$15.00	1.0×	All (cross-region)
Claude Opus 4	👑 Elite	$15.00	$75.00	1.5×	All (cross-region)

Speed factor is relative to Sonnet 4.6 (1.0×). Lower means faster — Haiku at 0.5× processes in half the time. The speed factor directly affects the duration-based infrastructure cost: faster agents cost less in compute time per pair.

Infrastructure cost formula (duration-based):

duration_per_pair = startup_seconds(10) + translatable_bytes × seconds_per_byte(0.0014) × speed_factor
infra_cost = fixed_job_overhead($0.012) + (total_container_seconds / 60) × ecs_cost_per_minute($0.00112)
customer_price = max((bedrock_cost + infra_cost) × (1 + markup_pct/100), minimum_per_pair × total_pairs)

Regional failover: If the primary region (us-east-2) returns a timeout or 503, the engine retries up to 2 times in the same region, then fails over to the next available region. Anthropic models use cross-region inference profiles (us. prefix) and route automatically. Qwen and DeepSeek fail over from east-2 to west-2. Mistral is available in all 3 regions.

Typical costs (per 50 KB document × 1 language, Express):

Agent	Bedrock	Infra	Customer Price
Qwen3 235B	$0.02	$0.01	$0.04
Mistral Large 3	$0.04	$0.01	$0.06
DeepSeek V3	$0.04	$0.01	$0.07
Claude Haiku 3.5	$0.09	$0.01	$0.13
Claude Sonnet 4.6	$0.32	$0.01	$0.44
Claude Opus 4	$1.61	$0.01	$2.11

$3.00 minimum per translated document: Every translated document has a floor price of $3.00, regardless of size or agent. The table above shows the raw calculated cost — but any value below $3.00 is quoted at $3.00 to the customer. This floor ensures the service is priced at the value of the result (a professionally translated document), not the cost of the compute. Larger documents (typically 100 KB+) naturally exceed this floor on all agents. The floor primarily affects small documents on Value-tier agents.

8.3 Infrastructure Integration

Translation engine metrics are exposed through two public interfaces:

GET /public/infrastructure — includes a translation section with daily spend, daily limit, active jobs, queue depth, cost-per-pair estimate, and daily token counts (daily_input_tokens, daily_output_tokens, daily_token_limit). Consumed by the daily digest Lambda, nightly sync, and any monitoring dashboard.
KCC Live Dashboard — a dedicated Translation Engine card displays real-time spend, limit, active/queued jobs, cost-per-pair, and a token usage chart (input + output bars against the 50M daily limit). Auto-refreshes every 30 seconds alongside all other infrastructure panels.
Infrastructure Live page — the Translation Engine sub-section shows live stats: spend today, tokens in, tokens out, and active jobs — populated from /public/infrastructure every 30 seconds.

Email notification timing: The email notification is the absolute LAST step in the pipeline. It fires only after: translation, deployment, search index rebuild, state update, and history push are ALL complete. The email is a comprehensive report including: job summary, translation results, CloudFront invalidation IDs, search index status, and any Bedrock error details. If Bedrock reports validation failures, the specific error messages and validator details are included in the email.

9. CLI Compatibility

The existing CLI tool (scripts/kcc_helpers/translate_doc.py) continues to work unchanged. A --remote flag routes through the new service instead of running Bedrock locally:

Flag	Behavior	Use Case
`--local` (current default)	Runs translator engine in-process, calls Bedrock directly from laptop	Development, debugging, single-doc quick fixes
`--remote`	POSTs to `/admin/translate`, polls `/admin/translate/status/{id}` every 5s, streams progress to stdout	Production translations, batch operations

The --remote flag produces identical terminal output to local mode — same progress bars, same chunk counters, same completion summary. The operator's workflow doesn't change; only the execution path does.

Default flip plan: Start with --local as default to avoid surprising anyone. After 30 days of clean production runs through the service, flip the default to --remote and add --local as the explicit fallback.

10. Configuration Reference — Protected Terms

All translation behavior is driven by a single config file: scripts/translation-config.json. This is the shared source of truth consumed by both the Python engine and the Go admin API.

10.1 Protected Terms (57 entries)

Brand names, product names, AWS services, exchange names, and acronyms that must never be translated or transliterated:

Cross Power Ministries of Pakistan, The Trinity Beast Infrastructure,
The Trinity Beast, Trinity Beast Command Center, Kiro Command Center,
Cory Dean Kalani, Shafiq Bhatti, BeastWebhook, BeastMirror, BeastMain,
BeastLRS, Claude Sonnet 4.6, Bedrock, ElastiCache, EventBridge,
CloudFront, GuardDuty, CloudWatch, CloudTrail, Step Functions,
Crypto.com, Coinbase, Gate.io, Gemini, Kraken, Aurora, Valkey,
Stripe, Kiro, Fargate, PostgreSQL, Lambda, Route 53, AutoOps,
TBCC, CPMP, TBI, KCC, OKX, ECR, ECS, ALB, NLB, WAF, SNS, SQS,
SES, VPC, IAM, S3 ...

Per-Request Protected Terms

In addition to the global protected terms list, you can submit document-specific terms via the protected_terms array in the translation request. This is useful for:

Proper nouns specific to a document (customer names, project names)
Technical identifiers not in the global list (new service names, API endpoints)
Domain-specific terminology that should remain in English

POST /admin/translate
{
  "docs": ["Trinity-Beast-API-Reference.html"],
  "langs": "all",
  "protected_terms": ["MyCustomService", "SpecialEndpoint", "ProjectAlpha"]
}

Per-request terms are merged with the global list for that job only. They do not persist across jobs.

10b. Configuration Reference — Preserve Patterns

10.2 Preserve Patterns

Regex patterns for technical tokens that must survive translation unchanged:

Pattern Name	Matches	Example
`url`	HTTP/HTTPS URLs	`https://api.cpmp-site.org/admin/translate`
`email`	Email addresses	`CoryDeanKalani@CPMP-Site.org`
`memory_size`	Number + memory unit	`1770 MB`, `32 GB`
`percentage`	Number + %	`98.5%`, `62%`
`cron_expr`	Cron expressions	`cron(0 11 * * ? *)`
`ip_address`	IPv4 with optional CIDR	`10.0.1.0/24`
`aws_arn`	AWS ARN format	`arn:aws:sns:us-east-2:211998422884:tbi-ops-notifications`
`aws_resource_id`	AWS resource identifiers	`vpc-03deaddb7083cd59c`, `sg-050b617f93b2388f6`

10c. Configuration Reference — Limits

10.3 Limits

Parameter	Value	Purpose
`max_chunk_chars`	6000	Default maximum characters per chunk (Latin scripts: es, pt, fr, de)
`max_chunk_chars_by_lang`	See below	Per-language overrides for complex scripts
`max_retries`	3	Retry attempts per chunk on validation failure
`request_timeout_seconds`	300	Per-Bedrock-call timeout (5 minutes — large RTL chunks need headroom)
`max_output_tokens`	8192	Maximum tokens in Bedrock response

Per-language chunk size overrides:

Languages	Chunk Size	Rationale
`hi, ur, ar`	3000 chars	Devanagari and Arabic scripts expand significantly during translation. Smaller chunks prevent Bedrock timeouts.
`ja, zh, ru`	4500 chars	CJK and Cyrillic have moderate expansion. Mid-range chunks balance throughput and reliability.
`es, pt, fr, de, it`	6000 chars (default)	Latin scripts translate quickly with minimal expansion.

11. Operations Guide

11.1 Submitting a Translation Job

Single document, all languages:

curl -s -X POST \
  -H "X-Admin-Key: $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"docs":["Trinity-Beast-API-Reference.html"],"langs":"all"}' \
  https://api.cpmp-site.org/admin/translate | jq .

Multiple documents, specific languages:

curl -s -X POST \
  -H "X-Admin-Key: $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"docs":["Trinity-Beast-API-Reference.html","Trinity-Beast-Architecture-Guide.html"],"langs":["es","pt","fr"]}' \
  https://api.cpmp-site.org/admin/translate | jq .

With idempotency key (safe to retry):

curl -s -X POST \
  -H "X-Admin-Key: $ADMIN_KEY" \
  -H "X-Idempotency-Key: api-ref-2026-05-16" \
  -H "Content-Type: application/json" \
  -d '{"docs":["Trinity-Beast-API-Reference.html"],"langs":"all"}' \
  https://api.cpmp-site.org/admin/translate | jq .

11.2 Monitoring Progress

# Check job status
curl -s -H "X-Admin-Key: $ADMIN_KEY" \
  https://api.cpmp-site.org/admin/translate/status/{job_id} | jq .

# View queue
curl -s -H "X-Admin-Key: $ADMIN_KEY" \
  https://api.cpmp-site.org/admin/translate/queue | jq .

# System health
curl -s -H "X-Admin-Key: $ADMIN_KEY" \
  https://api.cpmp-site.org/admin/translate/health | jq .

# Recent history
curl -s -H "X-Admin-Key: $ADMIN_KEY" \
  https://api.cpmp-site.org/admin/translate/history | jq .

11.3 Troubleshooting

Symptom	Cause	Resolution
Job stuck in `queued`	EventBridge Pipe not consuming	Check Pipe status in console; verify IAM role
429 on submit	Daily spend cap hit ($600)	Wait for 24h TTL expiry, or reset manually: `SET autoops:bedrock:spend:daily 0`
Partial completion	Some languages failed validation	`POST /admin/translate/retry-failed/{id}`
Worker timeout	Document too large (many chunks)	Check Step Function execution history for the failing chunk index
Cancel returns 404	Job only in Aurora, not Valkey	Cancel handler falls back to Aurora — ensure latest code is deployed
No email notification	Finalize Lambda error	Check CloudWatch logs for `tbi-translate-finalize`
Search not updated	Search rebuild timed out	Run `bash scripts/kcc.sh build-search` manually

Cancel a running job:

curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" \
  https://api.cpmp-site.org/admin/translate/cancel/{job_id}

This stops the Step Function execution immediately. Documents already translated and deployed remain live. The search index is rebuilt for whatever landed successfully.

12. Regional Failover

The translation engine implements automatic regional failover to maintain availability during Bedrock service disruptions. This was added after a us-east-2 outage during development exposed the single-region weakness — better to discover this before customers were affected.

12.1 Failover Chain

When a Bedrock call fails with a service-level error, the engine automatically retries in the next region:

Priority	Region	Location	Role
1	`us-east-2`	Ohio	Primary — all normal traffic
2	`us-east-1`	N. Virginia	First fallback
3	`us-west-2`	Oregon	Second fallback

The failover is transparent to the caller — the translation completes successfully as long as at least one region is available. A log message records when a fallback region was used.

12.2 Trigger Conditions

Failover is triggered for service-level errors and timeouts that indicate the region is unavailable or overloaded:

Error Type	Meaning	Action
`ServiceUnavailableException`	Bedrock service is down (503)	Retry same region once, then failover
`ThrottlingException`	Rate limit or capacity exceeded	Retry same region once, then failover
`ModelStreamErrorException`	Model streaming failure	Retry same region once, then failover
`ReadTimeoutError`	Response took longer than 300s	Retry same region once, then failover
`ConnectTimeoutError`	Could not establish connection within 10s	Retry same region once, then failover

Other errors (validation failures, authentication errors, malformed requests) are not retried — they would fail identically everywhere.

Per-Region Retry with Backoff

Each region gets 2 attempts before the engine moves to the next region. A 5-second backoff between attempts allows transient pressure to clear:

us-east-2 (attempt 1) → timeout → wait 5s →
us-east-2 (attempt 2) → timeout →
us-east-1 (attempt 1) → timeout → wait 5s →
us-east-1 (attempt 2) → timeout →
us-west-2 (attempt 1) → timeout → wait 5s →
us-west-2 (attempt 2) → timeout → FAIL (raise exception)

Total: 6 attempts across 3 regions. In practice, transient spikes clear within 5-10 seconds, so the retry within the same region usually succeeds without needing failover.

12.3 Cost Impact

Regional failover has negligible cost impact:

Same pricing: Bedrock pricing is identical across all three regions
No duplicate charges: Failed requests that trigger failover are not billed (no tokens consumed)
Minimal latency: Cross-region latency adds ~50-100ms per call, unnoticeable in the context of a multi-second translation
Estimated overhead: ~$0.0005 per document when failover is triggered (additional API call setup cost)

Resilience benefit: A complete regional outage no longer blocks translations. The May 2026 us-east-2 outage would have caused a 4-hour translation blackout without this feature. With failover, translations continued uninterrupted via us-east-1.

13. Document Preparation Guide

Proper document preparation ensures clean translations with minimal post-processing. This section covers the conventions that help the translation engine produce accurate results.

13.1 Code Tag Usage

The <code translate="no"> tag tells the translation engine to preserve content exactly as written. Use it correctly to avoid formatting artifacts in translated documents.

When to Use Code Tags

Use <code translate="no"> for technical identifiers that would break if translated:

Service names: tbi-ops-notify, BeastMain
API endpoints: /admin/translate, /public/infrastructure
Environment variables: AWS_REGION, ADMIN_KEY
Function names: _apply_sentinels(), translate()
File paths: /var/log/app.log, scripts/kcc.sh
Database columns: job_id, created_at
Configuration keys: max_retries, request_timeout_seconds
Command-line flags: --remote, --force

When NOT to Use Code Tags

Do not wrap pure data values in code tags — they should appear as plain text:

Memory sizes: 1770 MB, 32 GB, 3 GB (not <code>1770 MB</code>)
Timeouts: 60 seconds, 180s (not <code>60 seconds</code>)
Percentages: 98.5%, 62% (not <code>98.5%</code>)
Counts: 11 languages, 6 documents (not <code>11 languages</code>)
Costs: $600, $1.65 (not <code>$600</code>)
Version numbers in prose: version 17.7 (not <code>17.7</code>)

Why this matters: The translation engine's sentinel system protects code-tagged content from translation. If you wrap "32 GB" in code tags, it survives translation — but so does the monospace formatting, which looks wrong in prose. The engine has a post-processor that strips spurious code wrappers from pure numeric values, but it's better to author correctly from the start.

Quick Test

Ask yourself: "If I changed this value, would the system break?" If yes, use code tags. If no (it's just a number or measurement), leave it as plain text.

Content	Would changing it break something?	Use code tags?
`tbi-ops-notify`	Yes — Lambda name	✅ Yes
1770 MB	No — just a memory size	❌ No
`/admin/translate`	Yes — API endpoint	✅ Yes
$600	No — just a dollar amount	❌ No
`max_retries`	Yes — config key	✅ Yes
3 retries	No — just a count	❌ No

13.2 Protected Terms Submission

For documents with domain-specific terminology not in the global protected terms list, submit additional terms with the translation request:

POST /admin/translate
{
  "docs": ["Customer-Integration-Guide.html"],
  "langs": "all",
  "protected_terms": [
    "CustomerCorp",
    "ProjectPhoenix",
    "DataSync API",
    "IntegrationHub"
  ]
}

These terms are added to the global list for this job only. The engine will:

Wrap each term in <span translate="no"> during preprocessing
Replace with sentinel tokens before sending to Bedrock
Restore the original terms after translation
Validate that all terms survived intact

Best Practices for Protected Terms

Be specific: "DataSync API" is better than "DataSync" (avoids false matches)
Include variations: If a term appears as both "ProjectPhoenix" and "Project Phoenix", include both
Case matters: "CustomerCorp" and "customercorp" are treated as different terms
Don't over-protect: Common English words that translate well don't need protection

13.3 Clarification Workflow

When the translation engine encounters ambiguous content, it may flag it for human review. This happens in the validation phase when:

A protected term appears to have been partially translated
A version number format doesn't match the expected pattern
HTML structure differs significantly between source and output

Flagged content appears in the job status response under the warnings array:

{
  "status": "✅ [LPO] [us-east-2] [BeastMain] [/admin/translate/status/1747407720-a3f8b2c1d4e5] [200]",
  "status_code": 200,
  "endpoint": "/admin/translate/status/1747407720-a3f8b2c1d4e5",
  "cluster_node": "BeastMain",
  "region": "us-east-2",
  "language": "en",
  "timestamp": "2026-05-16T17:45:00Z",
  "data": {
    "job_id": "1747407720-a3f8b2c1d4e5",
    "state": "succeeded",
    "warnings": [
      "chunk 14 (ja): soft failure — protected term 'DataSync' may have been altered",
      "chunk 22 (ar): soft failure — version number format changed from X.Y.Z to X.Y"
    ]
  },
  "error": ""
}

Soft failures don't block the translation — the output is still deployed. Review the warnings and manually verify the flagged sections if needed.

Feedback loop: If you consistently see the same term flagged, add it to the global protected terms list in scripts/translation-config.json. This prevents future warnings and improves translation quality across all documents.

14. Pre-Scan Complexity Analysis

Before translation begins, the engine analyzes each document for complexity factors that may cause validation failures. This pre-scan identifies code-heavy sections and recommends whether to proceed, exercise caution, or split the document.

14.1 Complexity Metrics

The pre-scan calculates a complexity score for each section based on:

Factor	Weight	Why It Matters
Code tags	1.0 per tag	Each code tag must survive translation intact — more tags = more validation points
Code tags in tables	1.5 per tag	Tables with code examples are harder — model tends to merge or drop tags when reordering
Tables	2.0 per table	Tables with technical content require careful structure preservation
Pre blocks	0.5 per block	Usually have translate="no" — lower risk but still tracked
Protected spans	0.3 per span	Handled by sentinel system — low risk

Section Thresholds

High-density section: Complexity score > 15 OR > 10 code tags
Document split threshold: Total score > 50 OR > 3 high-density sections

14.2 Recommendations

Based on the analysis, the pre-scan returns one of three recommendations:

Recommendation	Criteria	Action
PROCEED	Score < 20, no high-density sections	Translate normally — low failure risk
CAUTION	Score < 50, ≤ 2 high-density sections	Proceed but monitor — may need retries
SPLIT	Score ≥ 50 OR > 3 high-density sections	Consider splitting document before translation

Pre-Scan Output Example

DOCUMENT TRANSLATION COMPLEXITY ANALYSIS
========================================
Total characters: 81,107
Total sections: 13
Total code tags: 287
Overall complexity score: 415.4
Recommendation: SPLIT

WARNINGS:
  ⚠️  Document has 287 code tags — high validation failure risk
  ⚠️  Section 'step-function' has 51 code tags — consider simplifying
  ⚠️  Section 'observability' has 48 code tags — consider simplifying

HIGH-DENSITY SECTIONS (9):
  • architecture: 11 code tags, score 22.7
  • sentinel-system: 22 code tags, score 34.5
  • step-function: 51 code tags, score 71.1
  ...

SUGGESTED SPLIT: 4 parts
  → Split after 'validators' (After 3 high-density sections)
  → Split after 'observability' (After 3 high-density sections)
  → Split after 'doc-prep' (After 3 high-density sections)

14.3 Document Splitting

When the pre-scan recommends splitting, it suggests natural break points at section boundaries. Options for handling complex documents:

Option 1: Split into Multiple Documents

Create separate HTML files for each part (e.g., Doc-Part1.html, Doc-Part2.html). Each part translates independently with lower failure risk. Link them together with navigation.

Option 2: Simplify High-Density Sections

Reduce code tag density in problematic sections:

Replace code examples with prose descriptions where possible
Move detailed code to appendices or separate reference docs
Use styled spans instead of code tags for visual-only formatting
Consolidate similar examples into single code blocks

Option 3: Translate in Batches

Submit fewer languages per job (e.g., 3 instead of 11). This reduces concurrent load and allows the model more capacity per translation. Retry failed languages individually.

Per-Language Split Thresholds (v2.5)

Complex scripts (Urdu, Arabic, Hindi) struggle with high tag density even when Latin-script languages handle the same chunk fine. The prescan now applies per-language code tag limits — tighter thresholds for scripts where the model is more likely to drop markup:

Language	Script	Max Code Tags per Part
Default (Latin, CJK, Cyrillic)	Latin / Kanji / Cyrillic	30
Urdu (ur)	Nastaliq	18
Arabic (ar)	Arabic	18
Hindi (hi)	Devanagari	20

Configuration key: max_code_tags_per_part_by_lang in translation-config.json. When the prescan runs for a specific language, it uses that language's threshold to determine split points. A document that translates as one part for Spanish may automatically split into 2-3 parts for Urdu.

Result: The Translation Service document (22 code tags in the Architecture section) previously failed for Urdu on every attempt. With the per-language threshold of 18, the prescan splits Architecture and Observability into separate parts. All 11 languages now translate successfully.

This document is an edge case: The Translation Engine documentation itself has 287 code tags and a complexity score of 415 — it's documentation about a translation engine, so it's packed with code examples. Most documents score under 50.

14.4 Splitting Safety Valve (v2.8)

Even when code tag density is low, a single part that exceeds the model's effective output window will be silently truncated — sections at the end of the part simply disappear from the output. The safety valve enforces a hard character limit per part regardless of prescan recommendations.

The Problem

The Performance Report (75 KB) has 18 sections with moderate code density. The prescan recommended splitting into 3 parts based on code tag thresholds. But Part 1 was 36 KB of prose-heavy content — well under the code tag limit but far beyond the model's output token budget. The model translated the first ~24 KB faithfully, then its output simply stopped. Sections 7-8 (partner-sustained, udp-engine) vanished without any error signal.

The Fix

# Safety valve: max chars per part (prevents model output truncation)
MAX_CHARS_PER_PART = 24000  # ~6000 tokens, well within max_output_tokens

The splitter now enforces a 24 KB ceiling on every part. If a part exceeds this limit after the prescan-based split, it is further subdivided at the nearest section boundary. This is conservative — Latin scripts could handle ~30 KB, but 24 KB is safe for all languages including RTL and CJK where token efficiency is lower.

Impact

Document	Before (v2.6)	After (v2.8)
Performance Report (75 KB)	3 parts (Part 1: 36 KB — truncated)	4 parts (largest: 22 KB — clean)
API Reference (180 KB)	8 parts (all under 24 KB already)	8 parts (no change — already safe)
Translation Engine (116 KB)	11 parts (code-density driven)	11 parts (no change — code splits dominate)

The safety valve only activates when the prescan's code-tag-based splitting produces oversized parts. For most documents, the code density split already keeps parts well under 24 KB.

Result: Performance Report went from dropping 3 entire sections (silent truncation) to a perfect 18/18 sections, 4/4 diagrams, 20/20 <br/> tags across all 11 languages.

15. Document-Level Preprocessor

The document-level preprocessor is a critical layer that runs before chunking. It extracts complex HTML elements from the entire document, replacing them with simple Unicode placeholders. After translation, the postprocessor restores the original elements. This eliminates the "model drops tags" failure mode entirely.

15.1 The Problem

The per-chunk sentinel system (Section 3) works well for most documents, but complex documents with many <code>, <strong>, and <em> tags exposed a fundamental limitation:

By the time chunks are created, they already contain many protected elements
Each element becomes a sentinel placeholder (__TBP0__, __TBP1__, etc.)
Chunks with 20+ placeholders overwhelm the model's attention
The model occasionally drops, duplicates, or merges placeholders during translation
Validation catches these failures, but retries often produce the same errors

Example failure: A chunk with 27 <code translate="no"> tags consistently failed validation with tag count mismatch (27→23) — the model dropped 4 placeholders despite explicit instructions to preserve them.

15.2 The Solution

Extract ALL problematic elements from the entire document before chunking. The model never sees these elements — only simple Unicode placeholders that it cannot confuse with HTML structure.

Key insight: The model cannot corrupt what it never sees. By extracting elements at the document level, each chunk has zero complex tags to worry about. The model translates clean prose with obvious markers.

Before vs After

Pipeline Stage	Before (v2.2)	After (v2.3)
Document received	290 code tags	290 code tags
After preprocessing	—	0 code tags (290 placeholders)
Per-chunk sentinels	20+ placeholders per chunk	0-2 placeholders per chunk
Model cognitive load	High (complex structure)	Low (clean prose)
Validation failures	Frequent on complex docs	Rare

15.3 Processing Flow

The preprocessor uses a two-phase extraction model that integrates into the translation pipeline as the first step:

Document → PHASE 1 (Density Lift) → PHASE 2 (Individual Extract) → Chunk → Translate → Reassemble → POSTPROCESS → Output
              ↓                           ↓                                                              ↓
     Lift entire dense containers  Extract remaining code/pre/       Single-pass flat restore
     as ⟦BLOCK_NNN⟧ placeholders   strong/em/numeric individually    (no nesting, no iteration)
     Original HTML preserved        Build manifest mapping            from manifest

Phase 1 runs FIRST on the raw DOM. It inspects container elements (tr, li, dd, dt, p, div) and lifts any container whose ratio of protected elements to prose characters exceeds a per-language density threshold. The lifted BLOCK entries store the original HTML byte-for-byte — no nested placeholders inside them.

Phase 2 then runs on whatever Phase 1 didn't lift, extracting individual elements (pre, code, spans, strong/em, a-tags, numeric patterns) the same way as before.

Integration in `engine.py`

def translate(text, target_lang, mode="html", ...):
    # Build preprocessor config from language profile
    preprocess_config = {"script_family": profile.get("script_family", "latin")}
    
    # Step 1: PREPROCESS — Phase 1 lifts dense blocks, Phase 2 extracts individuals
    simplified_html, manifest = preprocess_for_translation(text, preprocess_config)
    
    # Step 2: CHUNK — Split simplified document
    head, chunks, tail = chunker.split_document(simplified_html, lang=target_lang)
    
    # Step 3: TRANSLATE — Each chunk through Bedrock (per-chunk sentinels still run)
    for chunk in chunks:
        translated = _translate_chunk(chunk, ...)
    
    # Step 4: REASSEMBLE
    reassembled = chunker.reassemble(head, translated_chunks, tail)
    
    # Step 5: POSTPROCESS — Single-pass flat restore (no nesting)
    output = postprocess_translation(reassembled, manifest)

15.4 Element Extraction

The preprocessor extracts elements in a two-phase order. Phase 1 (density lift) runs first on the raw DOM. Phase 2 (individual extraction) runs on whatever Phase 1 didn't lift, processing elements in order of specificity (most specific first) to handle nesting correctly:

Phase	Pass	Elements Extracted	Placeholder Format
1	Density lift	Entire container elements (`tr`, `li`, `dd`, `dt`, `p`, `div`) exceeding density threshold	`⟦BLOCK_001⟧`
2	1	`<pre translate="no">` blocks	`⟦PRE_001⟧`
2	2	`<code translate="no">` tags	`⟦CODE_001⟧`
2	3	Other `translate="no"` elements	`⟦SPAN_001⟧`
2	3b	`<span class="...">` (structural CSS)	`⟦SPAN_001⟧`
2	4	`<strong>`, `<em>`, `<b>`, `<i>` tags (short content ≤40 chars)	`⟦STRONG_001⟧`, `⟦EM_001⟧`
2	5	`<a>` tags with event handlers	`⟦LINK_001⟧`
2	6	Numeric patterns (memory sizes, percentages, versions)	`⟦MEM_001⟧`, `⟦PCT_001⟧`, `⟦VER_001⟧`

Placeholder Format

Placeholders use Unicode brackets (⟦ and ⟧) that will never appear in real HTML content:

Format: ⟦TYPE_NNN⟧ (e.g., ⟦CODE_042⟧, ⟦BLOCK_007⟧)
TYPE: Element type (BLOCK, CODE, PRE, SPAN, STRONG, EM, B, I, LINK, MEM, PCT, VER)
NNN: Zero-padded index (001, 002, ...)
Zero collision risk: Unicode brackets don't appear in HTML, code, or prose

Phase 2 Nested Element Handling

Within Phase 2, the preprocessor handles arbitrary nesting depth by processing innermost elements first. Note: this nesting only occurs for individually-extracted elements — BLOCK entries from Phase 1 are always flat (they contain original HTML, never placeholders).

Source:
<span translate="no"><code translate="no">tbi-ops-notify</code> Lambda</span>

Pass 1: Extract inner code tag
<span translate="no">⟦CODE_001⟧ Lambda</span>

Pass 2: Extract outer span
⟦SPAN_002⟧

Model sees: ⟦SPAN_002⟧ (one token, no nesting)

Sibling Placeholder Awareness

When the preprocessor extracts elements from a container (e.g., a table cell), earlier passes leave placeholder text in the parent. Later passes must not be confused by these sibling placeholders — a <code translate="no"> tag in the same table cell as an already-extracted element is still a valid extraction target.

Bug fixed (v2.4): The original _is_inside_placeholder check walked up the DOM tree looking for the ⟦ character in any parent's text. This caused false positives — if a sibling element had been extracted (leaving ⟦CODE_042⟧ in the parent's text), the check incorrectly skipped remaining <code translate="no"> tags in the same container. Those unextracted tags then overwhelmed the model during complex-script translation (Hindi, Urdu). Fix: the check now always returns false — if an element still exists in the DOM tree, it wasn't extracted and is a valid target.

15.4a Density-Based Block Lifting

Phase 1 of the preprocessor inspects the raw DOM before any individual extraction. It identifies container elements where the ratio of protected elements to prose characters is too high — meaning there's too little translatable text to justify sending the element through the model.

Density Formula

density = protected_elements / max(prose_chars, 1)

Where:

protected_elements = unique count of <code> + translate="no" elements + class spans + <pre> inside the container
prose_chars = character count of visible text content (excluding protected element text)
Container elements checked: tr, li, dd, dt, p, div

The check uses pure DOM element counting — no regex. This makes it deterministic and immune to content patterns that could fool regex-based approaches.

Thresholds by Script Family

Thresholds are per-language via the script_family column in the translation_language_profiles Aurora table:

Script Family	Languages	Density Threshold
latin	es, fr, de, pt, it	0.06
cyrillic	ru	0.05
cjk	ja, zh	0.03
indic	hi, ur	0.03
arabic	ar	0.03

Guard Rails

Max prose for lift: 300 characters (configurable per language via Aurora density_max_prose column). Containers with more prose than this are never lifted, regardless of density.
Min protected elements: 2. Containers with only 1 protected element are never lifted — not worth the overhead.

Aurora Configuration Columns

The translation_language_profiles table stores per-language density configuration:

Column	Type	Purpose
`script_family`	varchar	Selects density threshold (latin, cyrillic, cjk, indic, arabic)
`density_lift_threshold`	numeric	Override threshold for this language (NULL = use script_family default)
`density_max_prose`	integer	Max prose chars for lift eligibility (default 300)

Example: Endpoint Table Row

<tr>
  <td><span translate="no">GET</span></td>
  <td><code translate="no">/health</code></td>
  <td>LPO server health check</td>
</tr>

Protected elements: 2 (span + code)
Prose chars: 25 ("GET" + "/health" excluded, "LPO server health check" counted)
Density: 2 / 25 = 0.08

→ CJK (threshold 0.03): LIFTED as ⟦BLOCK_NNN⟧ — density 0.08 > 0.03
→ Latin (threshold 0.06): LIFTED as ⟦BLOCK_NNN⟧ — density 0.08 > 0.06
→ If prose were 40+ chars: density drops to 0.05, Latin would KEEP it

Results

ElastiCache doc (ja): 45 blocks lifted, 156/156 code tags survive translation intact. Zero retries.

KCC doc (ja): 82 blocks lifted, 327/327 code tags survive translation intact. Zero retries.

Block lifting eliminates the failure mode where code-heavy table rows overwhelm the model's attention. The model never sees these rows — it translates a single ⟦BLOCK_NNN⟧ token (which it passes through unchanged) instead of a complex structure with multiple inline placeholders.

15.5 Restoration

After translation, the postprocessor restores placeholders in a single flat pass in reverse index order (high → low) to prevent prefix collisions. Because Phase 1 (density lift) runs BEFORE Phase 2 (individual extraction), BLOCK entries contain original HTML — never nested placeholders.

Why Single-Pass Works

BLOCK entries: Always contain original HTML (no placeholders inside). Phase 1 lifts the raw DOM before any extraction happens, so the stored HTML is pristine.
Phase 2 entries: May nest (e.g., ⟦SPAN_002⟧'s manifest HTML contains ⟦CODE_001⟧). Reverse-order restore handles this correctly — restoring index 2 first produces the text containing index 1, then index 1 is restored in the same pass.
No iteration needed: A single reverse pass resolves all nesting because higher-index entries always contain lower-index placeholders, never the reverse.
No orphan tracking: Every placeholder in the translated text maps to exactly one manifest entry. Missing placeholders (model dropped them) are logged but don't break restoration.

Translated output contains: ⟦BLOCK_003⟧ ... ⟦SPAN_002⟧ ... ⟦CODE_001⟧

Single pass (reverse order):
  Restore ⟦BLOCK_003⟧ → original HTML (no further substitution needed inside)
  Restore ⟦SPAN_002⟧ → <span translate="no">⟦CODE_001⟧ Lambda</span>
  Restore ⟦CODE_001⟧ → <code translate="no">tbi-ops-notify</code>

Final:
  <tr><td>...original row...</td></tr>
  <span translate="no"><code translate="no">tbi-ops-notify</code> Lambda</span>
  <code translate="no">tbi-ops-notify</code>

Perfect reconstruction — no nesting loops, no iteration.

Manifest Structure

The manifest maps each placeholder to its original HTML, enabling exact restoration:

{
  "⟦BLOCK_003⟧": {
    "type": "BLOCK",
    "html": "<tr><td><span translate=\"no\">GET</span></td><td><code translate=\"no\">/health</code></td><td>LPO server health check</td></tr>",
    "index": 3
  },
  "⟦SPAN_002⟧": {
    "type": "SPAN",
    "html": "<span translate=\"no\">⟦CODE_001⟧ Lambda</span>",
    "index": 2
  },
  "⟦CODE_001⟧": {
    "type": "CODE",
    "html": "<code translate=\"no\">tbi-ops-notify</code>",
    "index": 1
  }
}

Note: BLOCK entries always store pristine HTML (no ⟦...⟧ tokens inside). Phase 2 entries like SPAN may contain lower-index placeholders from inner elements extracted in a later pass — this is handled naturally by the reverse-order restore.

Result: The Translation Engine document (290 code tags, complexity 423) now translates with 0 retries across all 11 parts. Previously it failed consistently on Part 8 (config section with 27 code tags). Restoration is now a single deterministic pass with zero edge cases.

15.6 Numeric Pattern Extraction

Pass 5 extracts numeric patterns from the text after HTML element extraction. This protects bare numbers in prose that weren't already inside code or span tags. The model cannot convert, localize, or drop what it never sees.

Why Numeric Extraction Matters

When translating to complex scripts (Arabic, Hindi, Urdu), the model occasionally:

Drops numeric values: "32 GB" becomes just "GB" or disappears entirely
Localizes units: "MB" becomes "Mo" (French) or "ميغابايت" (Arabic)
Converts formats: "98.5%" becomes "٩٨٫٥٪" (Arabic numerals)
Paraphrases: "1770 MB" becomes "approximately 2 GB"

These transformations break technical accuracy. The numeric extraction pass prevents all of them.

Patterns Extracted

Pattern Type	Regex	Examples	Placeholder
Memory sizes	`\d+(?:\.\d+)?\s?(?:GB\|MB\|KB\|TB)`	32 GB, 1770 MB, 256 KB	`⟦MEM_001⟧`
Percentages	`\d+(?:\.\d+)?%`	98.5%, 62%, 100%	`⟦PCT_001⟧`
Version numbers	`\d+\.\d+(?:\.\d+)?`	4.6, 17.7, 2.3.1	`⟦VER_001⟧`

Processing Order

Numeric extraction runs after HTML element extraction (Passes 1-4). This means:

Numbers inside <code> tags are already protected by Pass 2
Numbers inside translate="no" spans are already protected by Pass 3
Pass 5 only catches bare numbers in prose that weren't otherwise protected
No double-extraction — the regex skips content already inside placeholders

Example: Hindi Translation

Source:
"The Lambda uses 1770 MB of memory and achieves 98.5% uptime."

After Pass 5:
"The Lambda uses ⟦MEM_042⟧ of memory and achieves ⟦PCT_043⟧ uptime."

Model translates prose, placeholders survive intact.

After restoration:
"लैम्ब्डा 1770 MB मेमोरी का उपयोग करता है और 98.5% अपटाइम प्राप्त करता है।"

Technical values preserved exactly — no localization, no conversion.

Result: Translation failures caused by numeric value loss (preserve_memory_size: missing: GB, MB) are now resolved across all 11 languages. Numeric values survive intact regardless of target script.

Placeholder Collision Prevention

The numeric extraction pass includes safeguards to prevent extracting numbers that are part of existing placeholder names (e.g., the "001" in ⟦CODE_001⟧):

Skips matches preceded by an underscore within a placeholder token
Skips matches followed by the closing bracket ⟧
Skips matches inside unclosed placeholder brackets

Without these guards, the numeric regex would corrupt placeholder names by extracting their index numbers, producing nested placeholders like ⟦CODE___TBN10__⟧ that the model cannot handle.

16. Notification System

The translation engine sends email notifications via the AutoOps notification pipeline (tbi-ops-notify Lambda → SES). Notifications are consolidated across batch jobs and include detailed per-document breakdowns.

16.1 Email Format

Each notification email includes:

Batch Summary: Total jobs, documents, languages, succeeded/failed pairs, final state
Total Time: Formatted as "Xm Ys" (e.g., "7m 12s") for readability
Per-Document Breakdown: Which languages succeeded and failed for each document
Deployment Status: CloudFront invalidation count, S3 deployment confirmation
Search Index Status: Whether the search index was rebuilt successfully
Error Details: Specific validation failures with chunk/validator information

Example Notification

Subject: [INFO] Translation Complete: 2 docs × 11 langs — 22/22 pairs SUCCEEDED

Batch Summary:
• Jobs: 2
• Documents: 2
• Languages: 11
• Total Pairs: 22
• Succeeded: 22
• Failed: 0
• Final State: SUCCEEDED
• Total Time: 7m 12s

Documents Translated:
• Trinity-Beast-TBI-Translation-Engine.html
  ✓ Succeeded: es, pt, fr, de, ru, hi, ja, zh, ar, ur, it
• Trinity-Beast-Infrastructure-Overview.html
  ✓ Succeeded: es, pt, fr, de, ru, hi, ja, zh, ar, ur, it

Deployment:
• CloudFront Invalidations: 2
• All translated files deployed to S3

Search Index:
• Rebuilt successfully (all 11 languages)

Partial Success Example

Subject: [WARNING] Translation Complete: 1 doc × 11 langs — 10/11 pairs PARTIAL

Documents Translated:
• Complex-Technical-Guide.html
  ✓ Succeeded: es, pt, fr, de, ru, hi, ja, zh, ar, it
  ✗ Failed: ur

Error Details:
• Complex-Technical-Guide.html → ur: chunk 14 failed validation after 3 retries
  check_tag_counts: expected 27 code tags, found 23

16.2 Batch Consolidation

When multiple translation jobs are submitted together (e.g., translating 5 documents), the notification system consolidates them into a single email:

Deferred sending: Each finalizing job checks if other jobs are still active
Last job sends: Only the last job to complete sends the consolidated email
Safety net: If all jobs see each other as "active" (race condition), a 5-second wait and re-check prevents missed notifications
Fallback: If no jobs are queued and the batch is wrapping up, the current job sends anyway

This prevents notification spam when translating multiple documents — you get one comprehensive email covering the entire batch, not 5 separate emails.

16.3 Document Resolver (v2.5)

When the same document appears in multiple jobs within a batch (e.g., initial run fails Urdu, retry succeeds), the notification resolves duplicate entries into a single final-state view:

Merge logic: For each document, collect all succeeded and failed languages across all jobs
Retry overrides failure: If a language appears in both succeeded (retry job) and failed (original job), it's reported as succeeded
Deduplicated counts: Summary totals (Succeeded, Failed, Total Pairs) are recalculated from the resolved state — not raw aggregation
Single entry per document: The notification shows each document exactly once with its final language breakdown

Without the resolver, a retry job would show the same document twice — once with the failure and once with the fix — making the notification confusing and the counts misleading.

16.4 Tag Inventory (v2.8)

Every notification includes a Tag Inventory section showing source vs output tag counts per document. This lets you detect at a glance if the model is adding or dropping tags. As of v2.8, the inventory also reports Mermaid diagram counts:

Tag Inventory (source → output):
• Trinity-Beast-Translation-Service.html
  IN:  code:22 pre:5 strong:8 em:2 a:4 br:3 diagrams:1
  OUT: code:22 pre:5 strong:8 em:2 a:4 br:3 diagrams:1

If the model has a bad day and adds a <span> that wasn't in the source, or drops code tags, you'll see the mismatch immediately:

  IN:  code:23 pre:5 strong:8 diagrams:2
  OUT: code:20 pre:4 strong:8 diagrams:1    ← 3 code dropped, 1 diagram lost

Tag counts are logged per-language in Aurora (translation_job_events) with tags_in and tags_out fields. The notification shows the first successful language's counts (source tags are identical across all languages since it's the same source document).

Recipient: All translation notifications go to CoryDeanKalani@CPMP-Site.org via the unified AutoOps notification pipeline. The sender is CPMP Mission <No-Reply@CPMP-Site.org>.

17. Delta Translation (Incremental Updates)

Documents change frequently — a new endpoint, a revised architecture, an updated pricing table. Without delta translation, every edit requires re-translating the entire document across all 11 languages. Delta translation solves this by identifying exactly which sections changed and translating only those, reusing cached translations for everything else.

17.1 Concept

The delta translation system leverages two key properties of the document library:

S3 versioning: Every document upload creates a new version in S3. Previous versions are retained indefinitely, providing a complete edit history.
 markers: Human-placed section boundaries in the English source that divide documents into logical, independently-translatable sections.

By comparing the current English document against the version that was last translated, the system identifies which sections changed (by content hash) and only sends those to Bedrock. Unchanged sections are pulled directly from the existing translated document. Typical savings: 70–90% on incremental updates.

17.2 S3 Versioning as Diff Source

The website bucket (trinity-beast-website-east2) has versioning enabled. Every aws s3 cp or s3api put-object creates a new version with a unique VersionId. The delta system uses this to:

List all versions of a document with timestamps and sizes
Fetch any previous version by VersionId
Compare current content against the version that was last successfully translated

No separate manifest storage is required — S3 already has the full history. A lightweight metadata file (docs/delta/{doc}.{lang}.json) tracks which VersionId was last translated for each document-language pair.

17.3 Comment Preservation (Sentinel Pass 0)

For delta translation to work,  markers must survive the translation round-trip. Previously, Bedrock silently dropped HTML comments during translation. The sentinel system now includes a Pass 0 that protects all HTML comments:

# Pass 0: Before Bedrock sees the chunk
<!-- TBI-CHUNK -->  →  __TBP0__    (sentinel token)
<!-- Section 5 -->  →  __TBP1__    (sentinel token)

# After translation: sentinels restored
__TBP0__  →  <!-- TBI-CHUNK -->
__TBP1__  →  <!-- Section 5 -->

This is implemented as the first pass in _apply_sentinels() in engine.py, before the existing translate="no" element extraction (Pass 1), paired span sentinels (Pass 2), and numeric protection (Pass 3). Comments are treated as Type A (FULL) sentinels — extracted completely and restored verbatim.

17.4 Hash-Based Section Matching

The algorithm is position-independent — sections are matched by content hash, not by index. This means markers can be added, removed, or repositioned between versions without breaking the delta logic.

Diagram 17.1: Delta Translation Flow

flowchart TD
    A[Fetch Current English from S3] --> B[Split by TBI-CHUNK markers]
    B --> C[Hash each section SHA-256]
    D[Fetch Previous English version] --> E[Split by TBI-CHUNK markers]
    E --> F[Hash each section]
    C --> G{Compare hashes}
    F --> G
    G -->|Match found| H[Pull from existing translation]
    G -->|No match| I[Send to Bedrock]
    H --> J[Reassemble with TBI-CHUNK markers]
    I --> J
    J --> K[Deploy to S3 + Save metadata]

    style A fill:#1e3a5f,stroke:#60a5fa,color:#e0e0e0
    style D fill:#1e3a5f,stroke:#60a5fa,color:#e0e0e0
    style H fill:#064e3b,stroke:#10b981,color:#e0e0e0
    style I fill:#7c2d12,stroke:#f97316,color:#e0e0e0
    style K fill:#1e3a5f,stroke:#60a5fa,color:#e0e0e0

Marker repositioning example:

Version 1: 5 sections (markers at A, B, C, D)
Version 2: 6 sections (new marker added — A, B, C, C2, D)
Sections before and after the new marker still match by hash → reused
The split section produces two new hashes → both translate fresh
Result: 4 of 6 sections reused (67% savings) despite marker change

17.5 CLI Commands

Four KCC commands support delta translation and chunk management:

Delta Diff (Analysis Only)

# List available S3 versions
bash scripts/kcc.sh delta-diff Trinity-Beast-API-Reference.html --list-versions

# Compare current vs previous version (auto-detects)
bash scripts/kcc.sh delta-diff Trinity-Beast-API-Reference.html

# Compare against a specific version
bash scripts/kcc.sh delta-diff Trinity-Beast-API-Reference.html --version-id ksYxUBZIUB8Roi2KQYje6ig9R7JesL9z

# Show delta for a specific language
bash scripts/kcc.sh delta-diff Trinity-Beast-API-Reference.html --lang ja

Delta Translate (Incremental Translation — Local CLI)

# Dry run — show what would change without calling Bedrock
bash scripts/kcc.sh delta-translate Trinity-Beast-API-Reference.html es --dry-run

# Translate only changed sections for one language
bash scripts/kcc.sh delta-translate Trinity-Beast-API-Reference.html es

# Translate changed sections for all languages
bash scripts/kcc.sh delta-translate Trinity-Beast-API-Reference.html all

# Force full translation (creates fresh baseline)
bash scripts/kcc.sh delta-translate Trinity-Beast-API-Reference.html all --force

Delta via Remote API (`options.delta`)

The delta option is also available on POST /admin/translate — the worker skips any language pair where the translated file on S3 is already newer than the source document. No local CLI needed.

# Submit a delta job via the remote API — skips up-to-date pairs automatically
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" -H "Content-Type: application/json" \
  -d '{"docs":["Trinity-Beast-API-Reference.html"],"langs":"all","options":{"delta":true}}' \
  https://api.cpmp-site.org/admin/translate | jq .

Delta Validate (Marker Preservation Check)

# Validate TBI-CHUNK markers survived translation for all delta-enabled docs
bash scripts/kcc.sh delta-validate all all

# Validate a specific doc across all languages
bash scripts/kcc.sh delta-validate Trinity-Beast-API-Reference.html all

# Validate a specific doc + language pair
bash scripts/kcc.sh delta-validate Trinity-Beast-API-Reference.html es

Reports pass/fail per doc×lang pair. Exit code 0 if all pass, 1 if any markers were lost. Run after any translation job to confirm Sentinel Pass 0 is working correctly.

Chunk Sizer (Auto-Placement Suggestions)

# Analyze a doc from S3 and suggest TBI-CHUNK marker placement
bash scripts/kcc.sh chunk-size Trinity-Beast-API-Reference.html

# Analyze a local file
bash scripts/kcc.sh chunk-size /path/to/local/doc.html

Scans the document for <section>, <h2>, <h3>, and .category-section boundaries. Reports current chunk sizes (if markers exist), identifies policy violations, and suggests where to insert markers to stay within the 15KB/18KB/12KB policy. Dense sections (high translate="no" density) automatically target the tighter 12KB limit.

17.6 Bootstrap Path

Existing translated documents do not contain  markers (they were stripped before the sentinel fix). The bootstrap sequence is:

First run (full cost): Use --force to translate the entire document. The sentinel fix preserves markers in the output. Delta metadata is saved to S3.
Subsequent runs (delta savings): The tool detects the existing translation has markers, loads metadata to identify the previous English version, and only translates changed sections.

After the bootstrap run, typical savings on incremental updates:

Change Type	Typical Savings	Example
Single section edit	85–95%	Fix a typo, update one endpoint
New section added	70–85%	Add a new feature section
Marker repositioned	60–75%	Split a large section in two
Major rewrite	20–40%	Restructure half the document

Cost model: At approximately $1.50 per section-language pair, a 9-section document across 11 languages costs ~$148.50 for a full translation. With delta (2 sections changed), the same update costs ~$33 — a 78% reduction.

Quick Reference

Item	Value
Model	`qwen.qwen3-235b-a22b-2507-v1:0` (Qwen3-235B — all languages)
Failover Regions	`us-east-2` → `us-east-1` → `us-west-2`
Target Languages	11 internal: es, pt, fr, de, ru, hi, ja, zh, ar, ur, it · 21 supported (TBTS customers)
Worker Runtime	Python 3.11 (ECS Fargate persistent service, auto-scaling 1→11)
Deploy/Finalize Runtime	Go (`provided.al2023`)
Worker Resources	2 vCPU / 6 GB per container (Fargate — no timeout ceiling)
Memory (Lambdas)	1770 MB
Worker Timeout	None (runs to completion)
Finalize Timeout	180s
Deploy Timeout	60s
Max Docs per Request	6
Max Active Jobs	3
Daily Dollar Cap	$600 (24h TTL auto-reset)
Daily Token Cap	50M combined tokens (24h TTL auto-reset)
Chunk Size (Latin scripts)	6000 chars
Chunk Size (CJK + Russian)	4500 chars (ja, zh, ru)
Chunk Size (Indic + Arabic)	3000 chars (hi, ur, ar)
Retries per Chunk	3
Max Part Size	24 KB (safety valve — prevents model output truncation)
MaxConcurrency (per-language)	0 (unlimited — all language containers launch simultaneously)
ECR Repository	`tbi-translate-worker`
SQS Queue	`trinity-beast-translation-queue`
Step Function	`tbi-translation-orchestrator`
IAM Role (Worker + Lambdas)	`tbi-translate-role`
IAM Role (Pipe)	`tbi-translate-pipe-role` (DELETED — pipe removed 2026-06-02)
Auto-Scaling	Demand-driven: 1 (idle) → N (job submitted, N = language count) → 1 (queue empty)
ECS Service	`tbi-translate-worker-service` (persistent, always-on)
IAM Role (Step Function)	`tbi-translate-orchestrator-role`
Valkey Keys	`tx:job:{id}`, `tx:active`, `tx:history`, `tx:idempotency:{key}`, `autoops:bedrock:spend:daily`, `autoops:bedrock:tokens:input:daily`, `autoops:bedrock:tokens:output:daily`
Aurora Tables	`translation_jobs`, `translation_job_events`
Delta Metadata	`docs/delta/{doc}.{lang}.json` (S3)
Delta CLI	`bash scripts/kcc.sh delta-diff`, `bash scripts/kcc.sh delta-translate`, `bash scripts/kcc.sh delta-validate`, `bash scripts/kcc.sh chunk-size`
CloudWatch Namespace	`TBI/Translation`