AiXPA Fine-tuned Llama 3.1 8B Model (With Ground Document)

This model is a fine-tuned version of Meta-Llama-3.1-8B-Instruct, specialized for the AiXPA project in the domain of Italian Public Administration (PA). It was trained using supervised fine-tuning (SFT) with LoRA (Low-Rank Adaptation) techniques on a dialogue dataset between an assistant and a PA user, with reference documents as context.

Model Details

Model Description

This model is based on Meta-Llama-3.1-8B-Instruct and has been fine-tuned using the Stefano-M-Community/final_all dataset for Italian Public Administration dialogue tasks. The model uses 4-bit quantization and LoRA adapters for efficient training and inference, making it suitable for deployment on consumer hardware while maintaining strong performance in PA-specific conversations with reference documents as context.

Developed by: LanD (FBK)
Model type: Causal Language Model (Fine-tuned)
Language(s) (NLP): Italian (primarily)
License: Please refer to the original Llama 3.1 license
Finetuned from model: meta-llama/Meta-Llama-3.1-8B-Instruct

Model Sources [optional]

Repository: [More Information Needed]
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

Direct Use

This model can be used directly for text generation tasks, particularly those related to the domain it was fine-tuned on. The model maintains the instruction-following capabilities of the base Llama 3.1 model while being specialized for specific use cases defined in the training dataset.

Downstream Use

The model can be further fine-tuned for specific tasks or integrated into larger applications that require text generation capabilities. The LoRA adapters make it easy to switch between different specialized versions.

Out-of-Scope Use

This model should not be used for generating harmful, misleading, or inappropriate content. It may not perform well on tasks significantly different from its training domain without additional fine-tuning.

Bias, Risks, and Limitations

This model inherits the biases and limitations present in the base Llama 3.1 model and may have additional biases introduced through the fine-tuning dataset. Key considerations include:

Domain Specificity: The model has been fine-tuned on a specific dataset and may not generalize well to domains outside its training scope
Quantization Effects: 4-bit quantization may introduce minor degradation in model performance compared to full precision
Context Limitations: Maximum context length of 4,200 tokens may limit performance on very long documents
Language Bias: Primarily trained on Italian content, may have limited performance in other languages

Recommendations

Thoroughly evaluate the model on your specific use case before deployment
Consider the potential for biased outputs and implement appropriate safeguards
Monitor model performance and outputs in production environments
Be aware of the model's training domain when applying to new tasks
Consider additional fine-tuning for specialized applications outside the training domain

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load the base model and tokenizer
base_model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load the LoRA adapter
model = PeftModel.from_pretrained(base_model, "Stefano-M-Community/aixpa_w_ground")

# Generate text
prompt = "Ciao, mi aiuti a scrivere un'azione sullo sport?"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Data

The model was fine-tuned on the Stefano-M-Community/final_all dataset from Hugging Face, which contains Italian Public Administration dialogue data between an assistant and PA users. This dataset was used for both training and evaluation.

Training Procedure

The model was trained using supervised fine-tuning (SFT) with LoRA (Low-Rank Adaptation) techniques. The training utilized 4-bit quantization for memory efficiency and multi-GPU training with 4 processes.

Training Hyperparameters

Training regime: Mixed precision training with 4-bit quantization
LoRA Configuration:
- Rank: 16
- Alpha: 32
- Dropout: 0.0
Sequence Length: 4,200 tokens
Learning Rate: 5e-5
Scheduler: Cosine annealing
Batch Size: 4 (training), 1 (evaluation)
Gradient Accumulation Steps: 2
Number of Epochs: 10
Weight Decay: 0.01
Warmup Ratio: 0.03
Early Stopping Patience: 5 epochs

Training Infrastructure

Hardware: Multi-GPU setup (4 processes)
Framework:
- Accelerate for distributed training
- DeepSpeed for optimization
- PEFT for LoRA implementation
Logging: Weights & Biases (WandB)
Evaluation Frequency: Every 35 steps
Checkpoint Saving: Every 35 steps

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated using the same dataset used for training: Stefano-M-Community/final_all. Evaluation was performed every 35 training steps to monitor training progress and prevent overfitting.

Factors

Training Progress: Monitored throughout training with early stopping patience of 5 epochs
Loss Metrics: Custom loss function implementation for supervised fine-tuning
Computational Efficiency: Evaluated performance with 4-bit quantization

Metrics

Training Loss: Monitored during training with logging every 10 steps
Evaluation Loss: Computed every 35 steps on the evaluation dataset
Early Stopping: Implemented with patience of 5 epochs to prevent overfitting

Results

Evaluation results are logged in Weights & Biases during training. The model was trained for up to 10 epochs with early stopping mechanism to ensure optimal performance without overfitting.

Evaluation Loss Performance:

The model (red line in eval/loss graph) shows a steep decrease from ~1.2 at step 35 to ~0.8 at step 160
Minimum loss achieved: approximately 0.8 around step 160
Final loss: approximately 0.89 at step 350
The model demonstrates good convergence with early stopping preventing overfitting

Summary

The fine-tuned model demonstrates improved performance on Italian Public Administration dialogue tasks while maintaining the general capabilities of the base Llama 3.1 model. The LoRA adaptation approach allows for efficient fine-tuning while preserving most of the original model's knowledge. This variant is specifically optimized for PA conversations with reference documents as context.

Model Examination

The model uses LoRA (Low-Rank Adaptation) which allows for parameter-efficient fine-tuning. This approach:

Preserves the original model weights while adding small adapter modules
Enables efficient switching between different task-specific adaptations
Reduces memory requirements during training and inference
Maintains interpretability by keeping the base model architecture intact

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

The environmental impact of this model is reduced compared to training from scratch due to:

Efficient Training: LoRA adaptation requires significantly less compute than full model training
4-bit Quantization: Reduces memory usage and energy consumption during training
Hardware Type: Multi-GPU setup (specific hardware configuration may vary)
Training Approach: Parameter-efficient fine-tuning reduces overall computational requirements

Note: Specific carbon emission calculations would require detailed hardware specifications and training duration measurements.

Technical Specifications

Model Architecture and Objective

Base Architecture: Llama 3.1 (8B parameters)
Adaptation Method: LoRA (Low-Rank Adaptation)
Objective: Supervised Fine-tuning for Italian Public Administration dialogue tasks with reference documents as context
Quantization: 4-bit quantization for efficient training and inference
Maximum Context Length: 4,200 tokens

Compute Infrastructure

Hardware

Training Setup: Multi-GPU configuration (4 processes)
Memory Optimization: 4-bit quantization with LoRA adapters
Distributed Training: Accelerate framework for multi-GPU coordination

Software

Framework: PyTorch with Transformers library
Training Libraries:
- PEFT 0.17.1 (Parameter-Efficient Fine-Tuning)
- Accelerate (distributed training)
- DeepSpeed (optimization)
- TRL (Transformer Reinforcement Learning)
Monitoring: Weights & Biases (WandB)
Configuration Management: DeepSpeed configuration for memory optimization

Citation

BibTeX:

@misc{aixpa_llama31_8b_lora,
  title={AiXPA Fine-tuned Llama 3.1 8B Model (With Ground Document)},
  author={LanD (FBK)},
  year={2025},
  howpublished={Hugging Face Model Repository},
  note={Fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct using LoRA, trained on Italian Public Administration dialogue data with reference documents}
}

APA:

LanD (FBK). (2025). AiXPA Fine-tuned Llama 3.1 8B Model. Hugging Face Model Repository. Fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct using LoRA.

Glossary

LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning technique that adds trainable low-rank matrices to existing model weights
SFT (Supervised Fine-Tuning): Training method using labeled data to improve model performance on specific tasks
4-bit Quantization: Technique to reduce model memory usage by representing weights with 4-bit precision
Multi-GPU Training: Distributed training approach using multiple GPUs to accelerate training

Model Card Authors

LanD (FBK)

Model Card Contact

For questions or issues regarding this model, please contact the LanD (FBK) through the appropriate channels.

Framework versions

PEFT 0.17.1

Downloads last month: 171

Model tree for LanD-FBK/aixpa_with_ground

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1179)

this model