Llama2-Fin-Summarizer
Model Description
This is a fine-tuned version of the LLaMA2 7B model, quantized to 4-bit precision, specifically trained for financial text summarization. The model was fine-tuned on a custom dataset of 200+ large financial documents, allowing it to generate concise and accurate summaries of financial reports, articles, and other related documents.
Model Details:
- Base Model: LLaMA2 7B
- Fine-tuning Dataset: Custom dataset with 200+ large financial documents
- Quantization: 4-bit (low memory usage)
- Task: Financial text summarization
- Trainable Parameters: The model was trained using parameter-efficient fine-tuning techniques, with only a subset of parameters being trainable during the fine-tuning process.
How to Use the Model
Installation
To use this model, you need to install the required Python libraries:
pip install accelerate peft bitsandbytes git+https://github.com/huggingface/transformers py7zr
Input/Output Format
- Input: The model accepts text input only.
- Output: The model generates summarized text output only.
Import with Hugging Face Transformers and PEFT
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import torch
peft_model_dir = "Karthikeyan-M3011/llama2-fin-summarizer"
trained_model = AutoPeftModelForCausalLM.from_pretrained(
peft_model_dir,
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
load_in_4bit=True,
)
tokenizer = AutoTokenizer.from_pretrained(peft_model_dir)
Inference with Llama2-Financial-Summarizer
query = 'Your text to summarize'
dialogue = query[:10000] # max tokens allowed
prompt = f"""
Summarize the following conversation.
### Input:
{dialogue}
### Summary:
"""
input_ids = tokenizer(prompt, return_tensors='pt', truncation=True).input_ids.cuda()
outputs = trained_model.generate(input_ids=input_ids, max_new_tokens=200)
output = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]
dash_line = '-' * 100
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'TRAINED MODEL GENERATED TEXT:\n{output}')
Limitations
- Dataset Bias: The model was fine-tuned on a relatively small dataset (200+ financial documents).
- Quantization Effects: The 4-bit quantization reduces memory usage but may introduce slight inaccuracies compared to models using higher precision.
- Context Limitations: The model can only process up to 10,000 tokens in the input, which may limit its ability to summarize very long documents in a single pass.
Training Parameters
The model was fine-tuned using the following training parameters:
from transformers import TrainingArguments
training_arguments = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
optim="paged_adamw_32bit",
logging_steps=1,
learning_rate=1e-4,
fp16=True,
max_grad_norm=0.3,
num_train_epochs=4,
evaluation_strategy="steps",
eval_steps=0.2,
warmup_ratio=0.05,
save_strategy="epoch",
group_by_length=True,
output_dir=OUTPUT_DIR,
report_to="tensorboard",
save_safetensors=True,
lr_scheduler_type="cosine",
seed=42,
)
model.config.use_cache = False
Training Execution
from trl import SFTTrainer
trainer = SFTTrainer(
model=model,
train_dataset=train_data,
eval_dataset=validation_data,
peft_config=lora_config,
dataset_text_field="text",
max_seq_length=1024,
tokenizer=tokenizer,
args=training_arguments,
)
trainer.train()
Authors
Citation
If you use this model in your research or applications, please cite it as follows:
@misc{llama2-fin-summarizer,
publisher = {Karthikeyan M},
title = {Fine-tuned LLaMA2 7B Model for Financial Summarization},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Karthikeyan-M3011/llama2-fin-summarizer}},
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model authors have turned it off explicitly.