bakhitovd
/

led-base-7168-ml

text2text-generation

Model card Files Files and versions Community

led-base-7168-ml / README.md

bakhitovd's picture

Update README.md

ac3ab1f almost 2 years ago

|

history blame contribute delete

2.25 kB

	---
	datasets:
	- bakhitovd/data_science_arxiv
	metrics:
	- rouge
	license: cc0-1.0
	pipeline_tag: summarization
	---
	# Fine-tuned Longformer for Summarization of Machine Learning Articles

	## Model Details
	- GitHub: https://github.com/Bakhitovd/led-base-7168-ml
	- Model name: bakhitovd/led-base-7168-ml
	- Model type: Longformer (alenai/led-base-16384)
	- Model description: This Longformer model has been fine-tuned on a focused subset of the arXiv part of the scientific papers dataset, specifically targeting articles about Machine Learning. It aims to generate accurate and consistent summaries of machine learning research papers.
	## Intended Use
	This model is intended to be used for text summarization tasks, specifically for summarizing machine learning research papers.
	## How to Use
	```python
	import torch
	from transformers import LEDTokenizer, LEDForConditionalGeneration
	tokenizer = LEDTokenizer.from_pretrained("bakhitovd/led-base-7168-ml")
	model = LEDForConditionalGeneration.from_pretrained("bakhitovd/led-base-7168-ml")
	```

	## Use the model for summarization
	```python
	article = "... long document ..."
	inputs_dict = tokenizer.encode(article, padding="max_length", max_length=16384, return_tensors="pt", truncation=True)
	input_ids = inputs_dict.input_ids.to("cuda")
	attention_mask = inputs_dict.attention_mask.to("cuda")
	global_attention_mask = torch.zeros_like(attention_mask)
	global_attention_mask[:, 0] = 1
	predicted_abstract_ids = model.generate(input_ids, attention_mask=attention_mask, global_attention_mask=global_attention_mask, max_length=512)
	summary = tokenizer.decode(predicted_abstract_ids, skip_special_tokens=True)
	print(summary)
	```
	## Training Data
	Dataset name: bakhitovd/data_science_arxiv\
	This dataset is a subset of the 'Scientific papers' dataset, which contains articles semantically, structurally, and meaningfully closest to articles describing machine learning. This subset was obtained using K-means clustering on the embeddings generated by SciBERT.
	## Evaluation Results
	The model's performance was evaluated using ROUGE metrics and it showed improved performance over the baseline models.

	![image.png](https://s3.amazonaws.com/moonup/production/uploads/63fb9a520aa18292d5c1027a/19mfKrjHkiCFDAL557Vsu.png)