--- library_name: LLaMA-3.1-8B-AGNews-SFT license: apache-2.0 datasets: - SetFit/ag_news language: - en metrics: - accuracy - precision - recall - f1 base_model: - meta-llama/Llama-3.1-8B tags: - llama-factory - full - generated_from_trainer model-index: - name: LLaMA-3.1-8B-AGNews-SFT results: [] pipeline_tag: text-classification --- # LLaMA-3.1-8B-AGNews-SFT This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) on the ag_news dataset. ## Model description This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) on the ag_news dataset. The model is a transformer-based model that was trained on a large corpus of text data and fine-tuned on the ag_news_train_num dataset. The model is capable of performing text classification tasks and can be used to classify text into one of four categories: World, Sports, Business, and Sci/Tech. ## Intended uses & limitations ### How to use You can use this model to classify text into one of four categories: World, Sports, Business, and Sci/Tech. To use the model, you can load it using the `transformers` library and pass the text you want to classify to the model. The model will return the predicted category for the text. ### Limitations and bias The model may not perform well on text that is outside the domain of the training data. The model may also exhibit bias in its predictions based on the biases present in the training data. It is important to be aware of these limitations when using the model and to evaluate its performance on your specific use case. ## Training and evaluation data ### Dataset The model was fine-tuned on the ag_news_train_num dataset, which is a subset of the AG News dataset. The AG News dataset is a collection of news articles from the AG's corpus of news articles on the web. The ag_news_train_num dataset contains 120,000 news articles from the AG News dataset, with 30,000 articles in each of the four categories: World, Sports, Business, and Sci/Tech. ### Data preprocessing The data was preprocessed by tokenizing the text using the `transformers` library's tokenizer for the LLaMA-3.1-8B model. The text was tokenized into subword tokens, and the tokens were converted into input features for the model. The input features were then used to train the model on the classification task. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 8 - total_train_batch_size: 256 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.03 - num_epochs: 1.0 ### Training results | Class | Accuracy | Precision | Recall | F1 Score | |-----------|----------|-----------|---------|----------| | World | 95.95% | 96.87% | 95.95% | 96.40% | | Sports | 99.42% | 99.00% | 99.42% | 99.21% | | Business | 91.53% | 93.95% | 91.53% | 92.72% | | Sci/Tech | 94.84% | 91.99% | 94.84% | 93.39% | | Overall (Macro Avg) | 95.43% | 95.45% | 95.43% | 95.43% | ### Framework versions - Transformers 4.45.2 - Pytorch 2.4.1+cu121 - Datasets 2.21.0 - Tokenizers 0.20.1