ns/tren

Customized machine translation pipeline on rich text document using LLM

Svelte 61.4%
TypeScript 17.4%
Rust 16.3%
CSS 3.5%
JavaScript 1.2%
Other 0.2%

Find a file

Napat Srichan 60d3a1fdb4 split into subcommands: run/web, refactor cli		2026-02-18 11:41:51 +07:00
src	split into subcommands: run/web, refactor cli	2026-02-18 11:41:51 +07:00
webui	add rough draft webui	2026-02-18 09:48:48 +07:00
.gitignore	add rough draft webui	2026-02-18 09:48:48 +07:00
Cargo.lock	fix env not load, template	2026-01-23 15:29:17 +07:00
Cargo.toml	add me	2026-01-23 15:29:38 +07:00
README.md	split into subcommands: run/web, refactor cli	2026-02-18 11:41:51 +07:00

README.md

Tren: Translation Engine

A lightweight machine-translation pipeline that works on rich-text documents using LLMs

Features
Prerequisites
Installation
- Build manually
Usage
Testing
Contributing
License
Contact

Features

Translate documents while keeping all the rich text format intact.
High quality translation from previous context.
Bring your own LLMs: compatible with OpenAI, Anthropic, Ollama, etc.; Anything with OpenAI API structure.

Prerequisites

Pandoc: ≥ v3.0
Your LLM provider

Installation

Make sure you already have Cargo installed.

cargo install --git https://git.napatsc.com/ns/tren

Usage

Web interface

unfinished; in the process of implementing

Command line

First, make sure you set up your environment variables: be creating a .env file or export the variables in your shell:

OPENAI_API_KEY="sk-xxxxxx"
# for a custom server other than OpenAI, change this below.
OPENAI_API_BASE="https://api.openai.com/v1"

Key	Required/Default	Description
`OPENAI_API_KEY`	Required	Your OpenAI API key; usually starts with `sk-`
`OPENAI_API_BASE`	`https://api.openai.com/v1`	Your LLM server endpoint. For custom LLM server other than OpenAI, change this value to the server URL

Then, call the program:

tren run --src English \
    --tar Spanish \
    -i some-document.md

Here are available CLI arguments:

Flag	Required/Default	Description
`--src`	Yes	Source language (e.g., `English`).
`--tar`	Yes	Target language (e.g., `Spanish`).
`-i`, `--input`	Yes	Path to the file that contains the text to translate.
`--inter-sheet`	`<INPUT>-inter.csv` (generated if omitted)	Path to a CSV file where intermediate translation results are stored for inspection/editing. Default to the input filename as `csv` with `-inter` suffix added.
`-o`, `--output`	`<INPUT>-translated.<EXT>` (same extension as input)	Path for the final translated file. Default to the input filename with `-translated` suffix added.
`--model`	`openai/gpt-oss-20b`	Hugging‑Face repository name of the LLM to use.
`--system`	Built‑in system prompt (see below)	System‑level prompt that sets the LLM’s role.
`--user`	Built‑in user prompt (see below)	User‑level prompt that supplies the actual translation request.
`-j`, `--parallel`	`1`	Maximum number of concurrent requests sent to the LLM. For a number larger than 1, please make sure your server supports batch inference; SGLang and vLLM are supported. Ollama and llama.cpp are not.
`-h`, `--help`	-	Show command help

Custom prompts

System prompt template

Default system prompt:

You are an expert translator. Please translate {{ source_language }} into {{ target_language }}. The user will submit sentences or paragraphs with some contexts; please only translate the intended text into {{ target_language }}.

- If there are symbols {{ special_tokens | join(", ") }}, keep the symbol intact on the result text in the correct position.
- Do not give any alternative translation or including any previous context, notes or discussion.

To create a custom prompt, here are available variables for composing another one:

source_language: Source language value from CLI
target_language: Target language value from CLI
special_tokens: List of special characters used to mark position of the source text, so the position is not lost on the target text.

User prompt template

Default user prompt:

{%- set previous_chunks = previous_chunks[-8:] -%}
{%- if previous_chunks -%}
Given the previous context:

{{ previous_chunks | join("\n\n") }}

Only translate the following text:

{% endif -%}
{{ source_text }}

To create a custom prompt, here are available variables for composing another one:

previous_chunks: A list of 32 chunked texts before the source text. For example: previous_chunks[-8:] will obtain 8 text chunks before the text.
source_text: The source text to be translated.

Testing

cargo test

Feel free to add more unit tests for the translation logic or for error handling.

Contributing

Contributions are welcome! Please open an issue or pull request.

Fork the repo.
Create a feature branch (git checkout -b feature/awesome).
Run cargo test to ensure existing tests pass.
Commit and push.
Open a pull request.

License

MIT

Contact

Maintainer: Napat Srichan
Repository: https://git.napatsc.com/ns/tren, GitHub mirror.

README.md Unescape Escape