AiXPA Fine-tuned Llama 3.1 8B Model (With Ground Document)

This model is a fine-tuned version of Meta-Llama-3.1-8B-Instruct, specialized for the AiXPA project in the domain of Italian Public Administration (PA). It was trained using supervised fine-tuning (SFT) with LoRA (Low-Rank Adaptation) techniques on a dialogue dataset between an assistant and a PA user, with reference documents as context.

Model Details

Model Description

This model is based on Meta-Llama-3.1-8B-Instruct and has been fine-tuned using the Stefano-M-Community/final_all dataset for Italian Public Administration dialogue tasks. The model uses 4-bit quantization and LoRA adapters for efficient training and inference, making it suitable for deployment on consumer hardware while maintaining strong performance in PA-specific conversations with reference documents as context.

  • Developed by: LanD (FBK)
  • Model type: Causal Language Model (Fine-tuned)
  • Language(s) (NLP): Italian (primarily)
  • License: Please refer to the original Llama 3.1 license
  • Finetuned from model: meta-llama/Meta-Llama-3.1-8B-Instruct

Model Sources [optional]

  • Repository: [More Information Needed]
  • Paper [optional]: [More Information Needed]
  • Demo [optional]: [More Information Needed]

Uses

Direct Use

This model can be used directly for text generation tasks, particularly those related to the domain it was fine-tuned on. The model maintains the instruction-following capabilities of the base Llama 3.1 model while being specialized for specific use cases defined in the training dataset.

Downstream Use

The model can be further fine-tuned for specific tasks or integrated into larger applications that require text generation capabilities. The LoRA adapters make it easy to switch between different specialized versions.

Out-of-Scope Use

This model should not be used for generating harmful, misleading, or inappropriate content. It may not perform well on tasks significantly different from its training domain without additional fine-tuning.

Bias, Risks, and Limitations

This model inherits the biases and limitations present in the base Llama 3.1 model and may have additional biases introduced through the fine-tuning dataset. Key considerations include:

  • Domain Specificity: The model has been fine-tuned on a specific dataset and may not generalize well to domains outside its training scope
  • Quantization Effects: 4-bit quantization may introduce minor degradation in model performance compared to full precision
  • Context Limitations: Maximum context length of 4,200 tokens may limit performance on very long documents
  • Language Bias: Primarily trained on Italian content, may have limited performance in other languages

Recommendations

  • Thoroughly evaluate the model on your specific use case before deployment
  • Consider the potential for biased outputs and implement appropriate safeguards
  • Monitor model performance and outputs in production environments
  • Be aware of the model's training domain when applying to new tasks
  • Consider additional fine-tuning for specialized applications outside the training domain

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load the base model and tokenizer
base_model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load the LoRA adapter
model = PeftModel.from_pretrained(base_model, "Stefano-M-Community/aixpa_w_ground")

# Generate text
prompt = "Ciao, mi aiuti a scrivere un'azione sullo sport?"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Data

The model was fine-tuned on the Stefano-M-Community/final_all dataset from Hugging Face, which contains Italian Public Administration dialogue data between an assistant and PA users. This dataset was used for both training and evaluation.

Training Procedure

The model was trained using supervised fine-tuning (SFT) with LoRA (Low-Rank Adaptation) techniques. The training utilized 4-bit quantization for memory efficiency and multi-GPU training with 4 processes.

Training Hyperparameters

  • Training regime: Mixed precision training with 4-bit quantization
  • LoRA Configuration:
    • Rank: 16
    • Alpha: 32
    • Dropout: 0.0
  • Sequence Length: 4,200 tokens
  • Learning Rate: 5e-5
  • Scheduler: Cosine annealing
  • Batch Size: 4 (training), 1 (evaluation)
  • Gradient Accumulation Steps: 2
  • Number of Epochs: 10
  • Weight Decay: 0.01
  • Warmup Ratio: 0.03
  • Early Stopping Patience: 5 epochs

Training Infrastructure

  • Hardware: Multi-GPU setup (4 processes)
  • Framework:
    • Accelerate for distributed training
    • DeepSpeed for optimization
    • PEFT for LoRA implementation
  • Logging: Weights & Biases (WandB)
  • Evaluation Frequency: Every 35 steps
  • Checkpoint Saving: Every 35 steps

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated using the same dataset used for training: Stefano-M-Community/final_all. Evaluation was performed every 35 training steps to monitor training progress and prevent overfitting.

Factors

  • Training Progress: Monitored throughout training with early stopping patience of 5 epochs
  • Loss Metrics: Custom loss function implementation for supervised fine-tuning
  • Computational Efficiency: Evaluated performance with 4-bit quantization

Metrics

  • Training Loss: Monitored during training with logging every 10 steps
  • Evaluation Loss: Computed every 35 steps on the evaluation dataset
  • Early Stopping: Implemented with patience of 5 epochs to prevent overfitting

Results

Evaluation results are logged in Weights & Biases during training. The model was trained for up to 10 epochs with early stopping mechanism to ensure optimal performance without overfitting.

Evaluation Loss Performance:

Evaluation Loss Curve

  • The model (red line in eval/loss graph) shows a steep decrease from ~1.2 at step 35 to ~0.8 at step 160
  • Minimum loss achieved: approximately 0.8 around step 160
  • Final loss: approximately 0.89 at step 350
  • The model demonstrates good convergence with early stopping preventing overfitting

Summary

The fine-tuned model demonstrates improved performance on Italian Public Administration dialogue tasks while maintaining the general capabilities of the base Llama 3.1 model. The LoRA adaptation approach allows for efficient fine-tuning while preserving most of the original model's knowledge. This variant is specifically optimized for PA conversations with reference documents as context.

Model Examination

The model uses LoRA (Low-Rank Adaptation) which allows for parameter-efficient fine-tuning. This approach:

  • Preserves the original model weights while adding small adapter modules
  • Enables efficient switching between different task-specific adaptations
  • Reduces memory requirements during training and inference
  • Maintains interpretability by keeping the base model architecture intact

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

The environmental impact of this model is reduced compared to training from scratch due to:

  • Efficient Training: LoRA adaptation requires significantly less compute than full model training
  • 4-bit Quantization: Reduces memory usage and energy consumption during training
  • Hardware Type: Multi-GPU setup (specific hardware configuration may vary)
  • Training Approach: Parameter-efficient fine-tuning reduces overall computational requirements

Note: Specific carbon emission calculations would require detailed hardware specifications and training duration measurements.

Technical Specifications

Model Architecture and Objective

  • Base Architecture: Llama 3.1 (8B parameters)
  • Adaptation Method: LoRA (Low-Rank Adaptation)
  • Objective: Supervised Fine-tuning for Italian Public Administration dialogue tasks with reference documents as context
  • Quantization: 4-bit quantization for efficient training and inference
  • Maximum Context Length: 4,200 tokens

Compute Infrastructure

Hardware

  • Training Setup: Multi-GPU configuration (4 processes)
  • Memory Optimization: 4-bit quantization with LoRA adapters
  • Distributed Training: Accelerate framework for multi-GPU coordination

Software

  • Framework: PyTorch with Transformers library
  • Training Libraries:
    • PEFT 0.17.1 (Parameter-Efficient Fine-Tuning)
    • Accelerate (distributed training)
    • DeepSpeed (optimization)
    • TRL (Transformer Reinforcement Learning)
  • Monitoring: Weights & Biases (WandB)
  • Configuration Management: DeepSpeed configuration for memory optimization

Citation

BibTeX:

@misc{aixpa_llama31_8b_lora,
  title={AiXPA Fine-tuned Llama 3.1 8B Model (With Ground Document)},
  author={LanD (FBK)},
  year={2025},
  howpublished={Hugging Face Model Repository},
  note={Fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct using LoRA, trained on Italian Public Administration dialogue data with reference documents}
}

APA:

LanD (FBK). (2025). AiXPA Fine-tuned Llama 3.1 8B Model. Hugging Face Model Repository. Fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct using LoRA.

Glossary

  • LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning technique that adds trainable low-rank matrices to existing model weights
  • SFT (Supervised Fine-Tuning): Training method using labeled data to improve model performance on specific tasks
  • 4-bit Quantization: Technique to reduce model memory usage by representing weights with 4-bit precision
  • Multi-GPU Training: Distributed training approach using multiple GPUs to accelerate training

Model Card Authors

LanD (FBK)

Model Card Contact

For questions or issues regarding this model, please contact the LanD (FBK) through the appropriate channels.

Framework versions

  • PEFT 0.17.1
Downloads last month
171
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LanD-FBK/aixpa_with_ground

Adapter
(1179)
this model