ruT5-base Model for Abstractive Summarization of Russian News

This is the ai-forever/ruT5-base model, fine-tuned for the task of abstractive summarization of news texts in Russian.

Model Description

The model is based on the T5 (Text-to-Text Transfer Transformer) architecture – an encoder-decoder transformer. The original pre-trained model ai-forever/ruT5-base was fine-tuned on a combined dataset consisting of Russian news articles from the Gazeta datasets and the Russian part of XLSum.

Details of the training process and results analysis can be found in the GitHub repository.

Fine-tuning Parameters (key):

Base model: ai-forever/ruT5-base
Dataset: Combined Gazeta + XLSum (Russian part), ~32k "article-summary" pairs after filtering.
Max input length: 512 tokens
Max output length (summary): 64 tokens

How to Use

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

MODEL_NAME = "Xristo/ruT5-base-rus-news-sum"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)

article_text = """..."""

input_ids = tokenizer(
    [article_text],
    max_length=512,
    padding="max_length",
    truncation=True,
    return_tensors="pt",
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    max_length=64,
    no_repeat_ngram_size=3,
    num_beams=4,
    early_stopping=True
)

summary = tokenizer.decode(output_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)

print("Generated summary:")
print(summary)

Evaluation Results (Metrics)

Evaluation was performed on a held-out test set (10% of the filtered Gazeta+XLSum dataset). The best checkpoint (20th epoch) showed the following results:

Model	ROUGE-1 F1	ROUGE-2 F1	ROUGE-L F1	METEOR	BERTScore F1	CHRF++	BLEU
ruT5-base	30.73	15.22	27.94	29.42	78.36	40.06	10.91

Comparison with baseline models: When compared to models IlyaGusev/mbart_ru_sum_gazeta (max summary length 200 tokens, R1=32.4, R2=14.3, RL=28.0, METEOR=26.4) and csebuetnlp/mT5_multilingual_XLSum (max summary length 84 tokens, R1=32.2, R2=13.6, RL=26.2 for RU XL-Sum), this fine-tuned ruT5-base model (with a max summary length of 64 tokens) demonstrates competitive results, surpassing them in ROUGE-2 and METEOR, which indicates a high information density of the generated summaries.

Xristo
/

ruT5-base-rus-news-sum

ruT5-base Model for Abstractive Summarization of Russian News

Model Description

How to Use

Evaluation Results (Metrics)

Model tree for Xristo/ruT5-base-rus-news-sum

Datasets used to train Xristo/ruT5-base-rus-news-sum