AlekseyCalvin/Lyrical_MT_rus2eng_2a2_DeepSeekR1Qwen3_8b_r64LoRA_PowerEMA_sigmaRel028

LYRICAL Russian to English Machine Translation Model

Variant 0.2a2: ORPO-tuned DeepSeek-R1-0528 Rank64 Adapter PowerEMA Merge
This adapter variant spawned via the post-hoc Power Function EMA technique from the influential "Analyzing and Improving the Training Dynamics of Diffusion Models" paper.
The relative Sigma value we used for the EMA soup herein was 0.28 – just about the max permitted by the function – due to the belatedly lamented fact of not saving very many checkpoints (just 8), & so as to get latter saves merged in at reasonable betas.

Developed by: SilverAgePoets.com
Model type: [Lyrical Machine Translation]
Languages (NLP): [Совпроясный, Worldish] aka [Russian, English]
Finetuned from: [unsloth/DeepSeek-R1-0528-Qwen3-8B-unsloth-bnb-4bit]

Experimental WIP prototype adapter for DeepSeek-R1-0528-Qwen3-8B.
This adapter belongs to our Lyrical MT (Machine Translation) series of fine-tuned LLMs and adapters.
Our ultimate aim with the Lyrical MT project is to iteratively foster a translation model capable of adaptively localizing idiomatic, formal/poetic/rhythmic, and performance-catered features of lyrical input texts, whilst retaining adequate accuracy at the level of direct semantic translation.

USES:

Intended scope of effective applicability limited to:
Russian to English translation of song lyrics, poems, scriptures, slogans, etc...
Translation from a Russian-language input text structured in accordance with literary, aesthetic, or/and vocalization-catering compositional devices to an English output text exhibiting cross-lingually rebalanced approximations of source-matched formal features.

Depending on the relative performance, foundations, and the idiosyncracies of a given checkpoint/adapter variant in the Lyrical MT series, the above-suggested applicability scope may plausibly extend to:
Russian to English text-to-text translation in general.
English to Russian translation.

The Lyrical MT models were fine-tuned primarily on single-line (fragment), double-line (couplet), quadruple-line (quatrain), and full-length bilingual textual inputs.

Training Info

The training was conducted on one L4 GPU (w/ 22.5 GB VRAM) via the TRL framework and the ORPO Trainer, leveraged via Unsloth over their 4-bit optimized dynamic quantized variant of the DeepseekR1 Qwen3-8B Distilled model.

Training Data

Fine-tuned for Odds Ratio Preference Optimization (ORPO) on our ORPO-catered Russian-to-English song lyrics translation/localization dataset.

Hyperparameters

Adapter Rank = 64 Adapter Alpha = 64 Learning Rate = 1e-4 Max Sequence Length = 2048 Optimizer = AdamW_8bit Learning Rate Scheduler Type = Linear Beta/Decay = 0.1 Warmup Steps = 5

Framework versions

PEFT 0.17.1 transformers 4.55.4

Note:

We would appreciate feedback/reports from anyone else who happens to try out this model, or its other variants (to be released in the near future).

AlekseyCalvin
/

Lyrical_MT_rus2eng_2a2_DeepSeekR1Qwen3_8b_r64LoRA_PowerEMA_sigmaRel028