Customized machine translation pipeline on rich text document using LLM
- Svelte 61.4%
- TypeScript 17.4%
- Rust 16.3%
- CSS 3.5%
- JavaScript 1.2%
- Other 0.2%
| src | ||
| webui | ||
| .gitignore | ||
| Cargo.lock | ||
| Cargo.toml | ||
| README.md | ||
Tren: Translation Engine
A lightweight machine-translation pipeline that works on rich-text documents using LLMs
Table of Contents
Features
- Translate documents while keeping all the rich text format intact.
- High quality translation from previous context.
- Bring your own LLMs: compatible with OpenAI, Anthropic, Ollama, etc.; Anything with OpenAI API structure.
Prerequisites
- Pandoc: ≥ v3.0
- Your LLM provider
Installation
Make sure you already have Cargo installed.
cargo install --git https://git.napatsc.com/ns/tren
Usage
Web interface
unfinished; in the process of implementing
Command line
First, make sure you set up your environment variables: be creating a .env file or export the variables in your shell:
OPENAI_API_KEY="sk-xxxxxx"
# for a custom server other than OpenAI, change this below.
OPENAI_API_BASE="https://api.openai.com/v1"
| Key | Required/Default | Description |
|---|---|---|
OPENAI_API_KEY |
Required | Your OpenAI API key; usually starts with sk- |
OPENAI_API_BASE |
https://api.openai.com/v1 |
Your LLM server endpoint. For custom LLM server other than OpenAI, change this value to the server URL |
Then, call the program:
tren run --src English \
--tar Spanish \
-i some-document.md
Here are available CLI arguments:
| Flag | Required/Default | Description |
|---|---|---|
--src |
Yes | Source language (e.g., English). |
--tar |
Yes | Target language (e.g., Spanish). |
-i, --input |
Yes | Path to the file that contains the text to translate. |
--inter-sheet |
<INPUT>-inter.csv (generated if omitted) |
Path to a CSV file where intermediate translation results are stored for inspection/editing. Default to the input filename as csv with -inter suffix added. |
-o, --output |
<INPUT>-translated.<EXT> (same extension as input) |
Path for the final translated file. Default to the input filename with -translated suffix added. |
--model |
openai/gpt-oss-20b |
Hugging‑Face repository name of the LLM to use. |
--system |
Built‑in system prompt (see below) | System‑level prompt that sets the LLM’s role. |
--user |
Built‑in user prompt (see below) | User‑level prompt that supplies the actual translation request. |
-j, --parallel |
1 |
Maximum number of concurrent requests sent to the LLM. For a number larger than 1, please make sure your server supports batch inference; SGLang and vLLM are supported. Ollama and llama.cpp are not. |
-h, --help |
- | Show command help |
Custom prompts
System prompt template
Default system prompt:
You are an expert translator. Please translate {{ source_language }} into {{ target_language }}. The user will submit sentences or paragraphs with some contexts; please only translate the intended text into {{ target_language }}.
- If there are symbols {{ special_tokens | join(", ") }}, keep the symbol intact on the result text in the correct position.
- Do not give any alternative translation or including any previous context, notes or discussion.
To create a custom prompt, here are available variables for composing another one:
source_language: Source language value from CLItarget_language: Target language value from CLIspecial_tokens: List of special characters used to mark position of the source text, so the position is not lost on the target text.
User prompt template
Default user prompt:
{%- set previous_chunks = previous_chunks[-8:] -%}
{%- if previous_chunks -%}
Given the previous context:
{{ previous_chunks | join("\n\n") }}
Only translate the following text:
{% endif -%}
{{ source_text }}
To create a custom prompt, here are available variables for composing another one:
previous_chunks: A list of 32 chunked texts before the source text. For example:previous_chunks[-8:]will obtain 8 text chunks before the text.source_text: The source text to be translated.
Testing
cargo test
Feel free to add more unit tests for the translation logic or for error handling.
Contributing
Contributions are welcome! Please open an issue or pull request.
- Fork the repo.
- Create a feature branch (
git checkout -b feature/awesome). - Run
cargo testto ensure existing tests pass. - Commit and push.
- Open a pull request.
License
MIT
Contact
- Maintainer: Napat Srichan
- Repository: https://git.napatsc.com/ns/tren, GitHub mirror.