Digest of models based on YandexGPT 5 Lite

Community Article Published March 19, 2025

Recently, Yandex released its new model in open access on Hugging Face — YandexGPT 5 Lite Pretrain. It is intended for further fine-tuning and research. The community quickly began fine-tuning this model for various tasks. Below is a digest of the developer fine-tuned models based on it.

About the model

YandexGPT 5 Lite is a model of 8-billion-parameter designed to work with extended context (up to 32k tokens) and is optimized for Russian and English.

The pretraining was done in two stages:

Stage 1: A general language model trained on Russian and English texts corpus totaling 15 trillion tokens (60% web pages, 15% code, 10% math, etc.), with context lengths of up to 8k tokens.
Stage 2: Fine-tuned on high-quality data (320 billion tokens total: 25% web pages, 19% math, 18% code, 18% educational data, etc.) with context length increased to 32k tokens and additional synthetic information used.

Purpose:

The model is intended for text continuation and can be used as a foundation for further adaptation to specific tasks.

Instruction models

These models were additionally trained on “prompt–answer” pairs in order to improve dialogue quality and produce instructional answers.

1. Vikhrmodels/Vikhr-YandexGPT-5-Lite-8B-it

Description: Fine-tuned with Supervised Fine-Tuning (SFT) on synthetic datasets such as GrandMaster-PRO-MAX and Grounded-RAG-RU-v2.
Capabilities: It supports dialogue, generates precise instructional responses, and facilitates bilingual communication (Russian/English).

2. IlyaGusev/saiga_yandexgpt_8b

Description: Based on YandexGPT 5 Lite Pretrain with additional fine-tuning for specialized tasks. Optimized with attention to its tokenizer specifics, improving generation quality in real-world scenarios.
Application: Well-suited for developing applications that require model adaptation for specific use cases and improved input processing.
Result: Saiga YandexGPT 8B achieves high scores in the Russian Leaderboard v2, especially in fluency (4.98 out of 5) and context retention (4.71 out of 5). In additional tests, its various versions scored between 37.5 and 43.1 points, demonstrating moderate variability in generation stability.

Quantized Models

Quantization reduces memory requirements and speeds up inference, which is crucial for devices with limited resources and edge deployment use-cases. There are several variants available based on YandexGPT 5 Lite Pretrain, for example:

yaroslav0530/YandexGPT-5-Lite-8B-pretrain-GGUF Another quantization option that balances generation quality and efficiency.
blues-alex/YandexGPT-5-Lite-8B-pretrain-Q4_K_M-GGUF An alternative 4-bit quantized implementation, similar in characteristics to the model from shoplikov.
Ronny/YandexGPT-5-Lite-8B-pretrain-Q8_0-GGUF Uses 8-bit quantization (Q8) for high-quality generation with even faster inference.
NikolayKozloff/YandexGPT-5-Lite-8B-pretrain-Q8_0-GGUF A similar Q8-quantized variant, fine-tuned for stable performance under limited resources.
holooo/YandexGPT-5-Lite-8B-pretrain-Q5_K_M-GGUF A model with 5-bit quantization (Q5), optimized for dialog tasks and producing instructional answers.
Nick0lay13/YandexGPT-5-Lite-8B-pretrain-Q8_0-GGUF Another Q8 variant focused on speed and stability.
shoplikov/YandexGPT-5-Lite-8B-pretrain-Q4_K_M-GGUF A 4-bit quantized model (Q4) that significantly reduces memory usage and speeds up inference without noticeable quality loss.
mlx-community/YandexGPT-5-Lite-8B-pretrain-Q8-mlx An 8-bit quantized option to run on Apple devices using the MLX runtime.

Depending on their project's needs, developers can choose the optimal balance between performance and quality.

LoRA Adapter

LoRA (Low-Rank Adaptation) enables targeted fine-tuning of the base model to enhance specific functions (e.g., logical reasoning) without retraining the entire network. This reduces computational costs while preserving generation quality.

evilfreelancer/r1_yandexgpt5-lite_lora

Description: This LoRA adapter is fine-tuned on datasets aimed at improving logical reasoning (the r1 approach). Thanks to additional tuning, the model can mimic step-by-step logical reasoning, akin to specialized models (such as r1 by DeepSeek or o1 by OpenAI).

Conclusion

The base YandexGPT 5 Lite Pretrain model is a versatile foundation for further adaptation. By fine-tuning this model, researchers, and developers can create powerful solutions tailored to various natural language processing tasks:

Instructional models (e.g., Vikhrmodels/Vikhr-YandexGPT-5-Lite-8B-it and IlyaGusev/saiga_yandexgpt_8b) excel in engaging in dialogue and generating coherent texts both in Russian and English languages.
Quantized models optimize resource usage by compressing the original architecture, reducing storage requirements and inference time while not hurting quality much.
LoRA adapters selectively improve specific aspects, such as logical reasoning or domain-specific knowledge. They minimize computational costs while delivering targeted performance gains.

Each fine-tuned version is tailored to its specific use case and may include extra tuning steps, such as specialized SFT data or tokenizer improvements, resulting in higher-quality generation in the chosen domain.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote