ruT5-base Model for Abstractive Summarization of Russian News

This is the ai-forever/ruT5-base model, fine-tuned for the task of abstractive summarization of news texts in Russian.

Model Description

The model is based on the T5 (Text-to-Text Transfer Transformer) architecture – an encoder-decoder transformer. The original pre-trained model ai-forever/ruT5-base was fine-tuned on a combined dataset consisting of Russian news articles from the Gazeta datasets and the Russian part of XLSum.

Details of the training process and results analysis can be found in the GitHub repository.

Fine-tuning Parameters (key):

  • Base model: ai-forever/ruT5-base
  • Dataset: Combined Gazeta + XLSum (Russian part), ~32k "article-summary" pairs after filtering.
  • Max input length: 512 tokens
  • Max output length (summary): 64 tokens

How to Use

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

MODEL_NAME = "Xristo/ruT5-base-rus-news-sum"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)

article_text = """..."""

input_ids = tokenizer(
    [article_text],
    max_length=512,
    padding="max_length",
    truncation=True,
    return_tensors="pt",
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    max_length=64,
    no_repeat_ngram_size=3,
    num_beams=4,
    early_stopping=True
)

summary = tokenizer.decode(output_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)

print("Generated summary:")
print(summary)

Evaluation Results (Metrics)

Evaluation was performed on a held-out test set (10% of the filtered Gazeta+XLSum dataset). The best checkpoint (20th epoch) showed the following results:

Model ROUGE-1 F1 ROUGE-2 F1 ROUGE-L F1 METEOR BERTScore F1 CHRF++ BLEU
ruT5-base 30.73 15.22 27.94 29.42 78.36 40.06 10.91

Comparison with baseline models: When compared to models IlyaGusev/mbart_ru_sum_gazeta (max summary length 200 tokens, R1=32.4, R2=14.3, RL=28.0, METEOR=26.4) and csebuetnlp/mT5_multilingual_XLSum (max summary length 84 tokens, R1=32.2, R2=13.6, RL=26.2 for RU XL-Sum), this fine-tuned ruT5-base model (with a max summary length of 64 tokens) demonstrates competitive results, surpassing them in ROUGE-2 and METEOR, which indicates a high information density of the generated summaries.

Downloads last month
50
Safetensors
Model size
223M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Xristo/ruT5-base-rus-news-sum

Finetuned
(15)
this model

Datasets used to train Xristo/ruT5-base-rus-news-sum