ruT5-base Model for Abstractive Summarization of Russian News
This is the ai-forever/ruT5-base
model, fine-tuned for the task of abstractive summarization of news texts in Russian.
Model Description
The model is based on the T5 (Text-to-Text Transfer Transformer) architecture – an encoder-decoder transformer. The original pre-trained model ai-forever/ruT5-base
was fine-tuned on a combined dataset consisting of Russian news articles from the Gazeta datasets and the Russian part of XLSum.
Details of the training process and results analysis can be found in the GitHub repository.
Fine-tuning Parameters (key):
- Base model:
ai-forever/ruT5-base
- Dataset: Combined Gazeta + XLSum (Russian part), ~32k "article-summary" pairs after filtering.
- Max input length: 512 tokens
- Max output length (summary): 64 tokens
How to Use
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
MODEL_NAME = "Xristo/ruT5-base-rus-news-sum"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
article_text = """..."""
input_ids = tokenizer(
[article_text],
max_length=512,
padding="max_length",
truncation=True,
return_tensors="pt",
)["input_ids"]
output_ids = model.generate(
input_ids=input_ids,
max_length=64,
no_repeat_ngram_size=3,
num_beams=4,
early_stopping=True
)
summary = tokenizer.decode(output_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
print("Generated summary:")
print(summary)
Evaluation Results (Metrics)
Evaluation was performed on a held-out test set (10% of the filtered Gazeta+XLSum dataset). The best checkpoint (20th epoch) showed the following results:
Model | ROUGE-1 F1 | ROUGE-2 F1 | ROUGE-L F1 | METEOR | BERTScore F1 | CHRF++ | BLEU |
---|---|---|---|---|---|---|---|
ruT5-base | 30.73 | 15.22 | 27.94 | 29.42 | 78.36 | 40.06 | 10.91 |
Comparison with baseline models:
When compared to models IlyaGusev/mbart_ru_sum_gazeta
(max summary length 200 tokens, R1=32.4, R2=14.3, RL=28.0, METEOR=26.4) and csebuetnlp/mT5_multilingual_XLSum
(max summary length 84 tokens, R1=32.2, R2=13.6, RL=26.2 for RU XL-Sum), this fine-tuned ruT5-base
model (with a max summary length of 64 tokens) demonstrates competitive results, surpassing them in ROUGE-2 and METEOR, which indicates a high information density of the generated summaries.
- Downloads last month
- 50
Model tree for Xristo/ruT5-base-rus-news-sum
Base model
ai-forever/ruT5-base