File size: 8,608 Bytes

03e5eec
 
 
 
 
 
 
1f1febf
 
 
80e4a33
61da571
 
abaea73
87d9ede
 
 
1f1febf
 
 
 
 
 
 
72573db
 
1f1febf
 
 
 
 
 
 
 
 
 
2a37fef
1f1febf
 
 
 
 
 
 
 
 
2715d7a
1f1febf
 
558f80a
1f1febf
558f80a
1f1febf
 
 
 
 
 
4229ef0
1f1febf
 
 
608b824
e30f063
 
 
 
3ffdf40
1f1febf
3ffdf40
 
1f1febf
3ffdf40
 
1f1febf
3ffdf40
1f1febf
3ffdf40
1f1febf
3ffdf40
1f1febf
3ffdf40
1f1febf
3ffdf40
 
1f1febf
3ffdf40
1f1febf
0c8b96d
1f1febf
608b824
1f1febf
 
4aa6f9e
024106f
34fb895
 
1b6ed0e
b894e3b
1b6ed0e
 
34fb895
e7c012a
0ecfab5
408b77f
 
510d0d5
 
edbb3fc
0ecfab5
6cff676
024106f
b144368
af29929

---
license: mit
language:
- ne
metrics:
- rouge
tags:
- Nepali summary
- Nepali bart
- Nepali
- summary
- text
- nepali text summary
pipeline_tag: text2text-generation
widget:
- text: "अत्यधिक माग भएका बेला दसैंमा चिनीको हाहाकार भएको थियो । उपत्यकाबाहिरका केही जिल्लामा चिनी पाइए पनि काठमाडौंमा भने अभाव नै कायम रहेको छ । प्रधानमन्त्री पुष्पकमल दाहालले बिहीबार बिहान उद्योग तथा वाणिज्य मन्त्री तथा मुख्यसचिवलाई चिनीको अभाव सिर्जना हुन नदिन सबै उपायको खोजी गर्न निर्देशन दिएका थिए । नेपाली चिनी उद्योगहरूले आम उपभोक्तालाई सहज हुने किसिमले बजारमा चिनी नपठाइ ठूला उद्योगलाई आपूर्ति गर्न गोदाममै राख्ने गरेको पनि भेटिएको छ । वाणिज्य विभागको तथ्यांक अनुसार, नेपालमा उत्पादन हुने चिनीको सत्तरी प्रतिशत चिनी बिभिन्न पेय पदार्थ, मिठाइ, चकलेट, विस्कुटलगायतका उद्योगहरुमा आपूर्ति हुने गर्दछ । नेपाल प्रहरीले नेपालमा रहेका सबै चिनी उद्योगको स्टक रेकर्ड चेक गर्ने तथा सो आधारमा बजारमा चिनी पठाउन उद्योगीहरूसँग छलफल गरिने विभागले जनाएको छ ।"
  example_title: "Example 1"
---
# Nep_Summ_BART:

<!-- Provide a quick summary of what the model is/does. -->

This model is pre-trained using BART on Nepali corpus and then fine-tuned on Nepali summary data.
<br>The model generates a summary for the text input.

The parameter size for the model is 101M.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

The model is trained using BART noising techniques like sentence permutation, token deletion, and random token masking.
<br>The noisy data is fed into the encoder of the transformer and the denoising task/ objective is fulfilled by the decoder of the transformer model.

Cross-entropy loss is used for both the pre-training and fine-tuning of the model.

The Loss for pre-training is as follows:

| Epoch   |      Training Loss      |  Val Loss |
|----------|:-------------:|------:|
| 1 |  0.8137 | 0.8010 |
| 2 |  0.7861 | 0.7524 |
| 3 |  0.7495 | 0.7290 |

The ROUGE Score after the fine-tuning, for the BBC XLSum Nepali Test Dataset is:

ROUGE1 : 0.177

ROUGE2 : 0.059

ROUGEL : 0.154

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
You can use this model for text summarization.
<br>Could be used as an encoder-only model using BartForSequenceClasssification.
## How to Get Started with the Model

Use the code below to get started with the model.
```
# make sure to install the dependencies below/ from requirements.txt
# pip install transformers==4.35
# pip install huggingface_hub==0.23.0

import torch

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("pascalrai/nep_summ_BART")
model = AutoModelForSeq2SeqLM.from_pretrained("pascalrai/nep_summ_BART")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

sentence = """अत्यधिक माग भएका बेला दसैंमा चिनीको हाहाकार भएको थियो । उपत्यकाबाहिरका केही जिल्लामा चिनी पाइए पनि काठमाडौंमा भने अभाव नै कायम रहेको छ । प्रधानमन्त्री पुष्पकमल दाहालले बिहीबार बिहान उद्योग तथा वाणिज्य मन्त्री तथा मुख्यसचिवलाई चिनीको अभाव सिर्जना हुन नदिन सबै उपायको खोजी गर्न निर्देशन दिएका थिए ।

नेपाली चिनी उद्योगहरूले आम उपभोक्तालाई सहज हुने किसिमले बजारमा चिनी नपठाइ ठूला उद्योगलाई आपूर्ति गर्न गोदाममै राख्ने गरेको पनि भेटिएको छ । वाणिज्य विभागको तथ्यांक अनुसार, नेपालमा उत्पादन हुने चिनीको सत्तरी प्रतिशत चिनी बिभिन्न पेय पदार्थ, मिठाइ, चकलेट, विस्कुटलगायतका उद्योगहरुमा आपूर्ति हुने गर्दछ ।

नेपाल प्रहरीले नेपालमा रहेका सबै चिनी उद्योगको स्टक रेकर्ड चेक गर्ने तथा सो आधारमा बजारमा चिनी पठाउन उद्योगीहरूसँग छलफल गरिने विभागले जनाएको छ"""

inputs = tokenizer(sentence, max_length=1000, return_tensors="pt")
summary_ids = model.to(device).generate(inputs["input_ids"].to(device))

tokenizer.decode(summary_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)

'दशैंको मुखमा चिनीको चरम अभाव भएको भन्दै नेपाल प्रहरीले सबै चिनी उद्योगको स्टक रेकर्ड चेक गर्ने र बजारमा चिनी पठाउन उद्योगीहरूसँग छलफल गर्ने जनाएको छ।'

```
#### Hardware

The model was pre-trained continuously on a single A10G GPU in an AWS instance for 133 hours with each epoch taking 45 hours using bf16 quantization.

#### Possible Future Directions:

1. Use a decoder-only model for pre-training and summarization.
<br>As it seems the case when the span deleting tokens is not very large, the model learns to copy the token from the encoder context during Cross-attention to decoder generation.
<br>Thus, hurts the performance of the Abstractive Summarization task.
<br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all.

2. We have pre-trained our model with approx 16 GB of data, and testing Classification result on <a href='https://www.kaggle.com/datasets/ashokpant/nepali-news-dataset-large/data'>Nepali News Dataset (Large)</a> with a couple of Nepali transformer based Models available on Hugging Face,
<br> Our models seem to do better than others with an accuracy of 0.58 on validation but,
<br> There could be two reasons for this:

   - There is still room for improving the quality of the data. (test with HLP)
     <br>Try below, if HLP >> 0.58
   - We still do not have enough data for generalization as Transformer models only perform well with large amounts of pre-trained data compared with Classical Sequential Models.

#### Authors:

<a href="https://www.linkedin.com/in/bijaya-bhatta-69536018a/">Vijaya Bhatta</a>
<br><a href="https://www.linkedin.com/in/pascal-rai/">Pascal Rai</a>
<br><a href="https://www.linkedin.com/in/niranjan-shrestha-gem/">Niranjan Shrestha</a>
<br><a href="https://www.linkedin.com/in/dristi-sigdel-3120131b1/">Dristi Sigdel</a>
<br><a href="https://www.linkedin.com/in/sujan-neupane-596964211/">Sujan Neupane</a>
<br><a href="https://www.linkedin.com/in/sagar-kafle-a1b84b185/">Sagar Kafle</a>