Digest of models based on YandexGPT 5 Lite

Community Article Published March 19, 2025

Recently, Yandex released its new model in open access on Hugging Face — YandexGPT 5 Lite Pretrain. It is intended for further fine-tuning and research. The community quickly began fine-tuning this model for various tasks. Below is a digest of the developer fine-tuned models based on it.

About the model

YandexGPT 5 Lite is a model of 8-billion-parameter designed to work with extended context (up to 32k tokens) and is optimized for Russian and English.

The pretraining was done in two stages:

  • Stage 1: A general language model trained on Russian and English texts corpus totaling 15 trillion tokens (60% web pages, 15% code, 10% math, etc.), with context lengths of up to 8k tokens.
  • Stage 2: Fine-tuned on high-quality data (320 billion tokens total: 25% web pages, 19% math, 18% code, 18% educational data, etc.) with context length increased to 32k tokens and additional synthetic information used.

Purpose:

The model is intended for text continuation and can be used as a foundation for further adaptation to specific tasks.

Instruction models

These models were additionally trained on “prompt–answer” pairs in order to improve dialogue quality and produce instructional answers.

1. Vikhrmodels/Vikhr-YandexGPT-5-Lite-8B-it

  • Description: Fine-tuned with Supervised Fine-Tuning (SFT) on synthetic datasets such as GrandMaster-PRO-MAX and Grounded-RAG-RU-v2.

  • Capabilities: It supports dialogue, generates precise instructional responses, and facilitates bilingual communication (Russian/English).

2. IlyaGusev/saiga_yandexgpt_8b

  • Description: Based on YandexGPT 5 Lite Pretrain with additional fine-tuning for specialized tasks. Optimized with attention to its tokenizer specifics, improving generation quality in real-world scenarios.

  • Application: Well-suited for developing applications that require model adaptation for specific use cases and improved input processing.

  • Result: Saiga YandexGPT 8B achieves high scores in the Russian Leaderboard v2, especially in fluency (4.98 out of 5) and context retention (4.71 out of 5). In additional tests, its various versions scored between 37.5 and 43.1 points, demonstrating moderate variability in generation stability.

image/jpeg image/jpeg

Quantized Models

Quantization reduces memory requirements and speeds up inference, which is crucial for devices with limited resources and edge deployment use-cases. There are several variants available based on YandexGPT 5 Lite Pretrain, for example:

Depending on their project's needs, developers can choose the optimal balance between performance and quality.

LoRA Adapter

LoRA (Low-Rank Adaptation) enables targeted fine-tuning of the base model to enhance specific functions (e.g., logical reasoning) without retraining the entire network. This reduces computational costs while preserving generation quality.

evilfreelancer/r1_yandexgpt5-lite_lora

  • Description: This LoRA adapter is fine-tuned on datasets aimed at improving logical reasoning (the r1 approach). Thanks to additional tuning, the model can mimic step-by-step logical reasoning, akin to specialized models (such as r1 by DeepSeek or o1 by OpenAI).

Conclusion

The base YandexGPT 5 Lite Pretrain model is a versatile foundation for further adaptation. By fine-tuning this model, researchers, and developers can create powerful solutions tailored to various natural language processing tasks:

  • Instructional models (e.g., Vikhrmodels/Vikhr-YandexGPT-5-Lite-8B-it and IlyaGusev/saiga_yandexgpt_8b) excel in engaging in dialogue and generating coherent texts both in Russian and English languages.
  • Quantized models optimize resource usage by compressing the original architecture, reducing storage requirements and inference time while not hurting quality much.
  • LoRA adapters selectively improve specific aspects, such as logical reasoning or domain-specific knowledge. They minimize computational costs while delivering targeted performance gains.

Each fine-tuned version is tailored to its specific use case and may include extra tuning steps, such as specialized SFT data or tokenizer improvements, resulting in higher-quality generation in the chosen domain.

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment