feat: upload model files

Browse files

Files changed (6) hide show

README.md +30 -61
config.json +5 -2
model-00001-of-00002.safetensors +1 -1
model-00002-of-00002.safetensors +1 -1
special_tokens_map.json +1 -7
tokenizer_config.json +1 -3

README.md CHANGED Viewed

@@ -4,33 +4,35 @@ language:
 - en
 base_model:
 - meta-llama/Llama-3.2-3B-Instruct
-pipeline_tag: text-classification
 tags:
 - Speech Recognition
 - ATC
 ---
-# ATC Communication Expert Model
-A fine-tuned model specialized in improving and analyzing Air Traffic Control (ATC) communications, extracting relevant information from raw transcripts.
 ## Model Details
 ### Model Description
-This model is a fine-tuned version of Llama-3.2-3B-Instruct optimized for processing Air Traffic Control communications. It can:
 - Improve raw ATC transcripts with proper punctuation and formatting
 - Identify communication intentions (pilot requests, ATC instructions, etc.)
 - Extract key information such as flight numbers, altitudes, headings, and other numerical data
 - Analyze speaker roles and communication patterns
-The model was fine-tuned using LoRA (Low-Rank Adaptation) with PEFT (Parameter-Efficient Fine-Tuning) techniques to efficiently adapt the Llama 3.2 model to this specialized domain.
 - **Developed by:** ATC NLP Team
-- **Model type:** Fine-tuned Llama 3.2 with LoRA adapters
 - **Language(s):** English, specialized for ATC terminology
-- **License:** Same as the base model (Llama 3.2)
 - **Finetuned from model:** meta-llama/Llama-3.2-3B-Instruct
 ## Uses
@@ -80,21 +82,15 @@ This model is not suitable for:
 ## How to Get Started with the Model
 ```python
-from peft import PeftModel
 from transformers import AutoModelForCausalLM, AutoTokenizer
-# Load the model with LoRA adapters
-base_model = AutoModelForCausalLM.from_pretrained(
-    "meta-llama/Llama-3.2-3B-Instruct",
     torch_dtype="auto",
     device_map="auto"
 )
-tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
-model = PeftModel.from_pretrained(base_model, "path_to_adapters")
-# Alternatively, use the merged model if available
-# model = AutoModelForCausalLM.from_pretrained("path_to_merged_model")
-# tokenizer = AutoTokenizer.from_pretrained("path_to_merged_model")
 # Process an ATC message
 instruction = "As an ATC communication expert, improve this transcript and analyze its intentions and data."
@@ -109,45 +105,27 @@ response = tokenizer.decode(outputs[0, inputs["input_ids"].shape[1]:], skip_spec
 print(response)
 ```
-## Training Details
-### Training Data
-The model was trained on a dataset of ATC communications with:
-- Original raw transcripts
-- Properly punctuated and formatted versions
-- Annotated intentions (PSC, PSR, PRP, PRQ, PRB, PAC, ASC, AGI, ACR, END)
-- Extracted numerical data (altitudes, headings, flight numbers, etc.)
-- Speaker and listener information
-### Training Procedure
-The model was fine-tuned using LoRA with the following approach:
-- Parameter-efficient fine-tuning using PEFT
-- LoRA applied to key attention layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
-- Optimized with Unsloth for efficiency
-#### Training Hyperparameters
-- **Base model:** meta-llama/Llama-3.2-3B-Instruct
-- **LoRA rank:** 16
-- **LoRA alpha:** 16
-- **Learning rate:** 2e-4
-- **Batch size:** 4
-- **Gradient accumulation steps:** 4
-- **Epochs:** 3
-- **Warmup ratio:** 0.03
-- **Max sequence length:** 2048
-- **Training regime:** BF16 mixed precision where available, FP16 otherwise
-- **Optimizer:** AdamW 8-bit
 ## Evaluation
-### Testing Data, Factors & Metrics
-#### Testing Data
-The model was tested on a diverse set of ATC communications, including:
 - Clearances and instructions
 - Pilot requests and reports
 - Emergency communications
@@ -157,19 +135,10 @@ The model was tested on a diverse set of ATC communications, including:
 ### Model Architecture and Objective
-- **Base architecture:** Llama-3.2-3B-Instruct
-- **Fine-tuning method:** LoRA with PEFT
-- **Optimization library:** Unsloth
 - **Training objective:** Improving and analyzing ATC communications
-### Compute Infrastructure
-- **Framework versions:**
-  - PEFT 0.15.2
-  - Unsloth (latest version used during training)
-  - Transformers (compatible with the base model)
-  - PyTorch (with BF16 support where available)
-## Model Card Contact
-For issues or questions about this model, please open an discussion in the repository.

 - en
 base_model:
 - meta-llama/Llama-3.2-3B-Instruct
+pipeline_tag: text-generation
 tags:
 - Speech Recognition
 - ATC
+- Unsloth
+- LoRA-Merged
 ---
+# ATC Communication Expert Model (Merged)
+A fine-tuned model specialized in improving and analyzing Air Traffic Control (ATC) communications, with LoRA adapters merged into the base model.
 ## Model Details
 ### Model Description
+This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct with merged LoRA adapters, optimized for processing Air Traffic Control communications. It can:
 - Improve raw ATC transcripts with proper punctuation and formatting
 - Identify communication intentions (pilot requests, ATC instructions, etc.)
 - Extract key information such as flight numbers, altitudes, headings, and other numerical data
 - Analyze speaker roles and communication patterns
+The model was created by merging LoRA adapters (fine-tuned on ATC communications) into the Llama 3B base model, creating a unified model optimized for this specialized domain.
 - **Developed by:** ATC NLP Team
+- **Model type:** Llama 3B with merged LoRA adapters
 - **Language(s):** English, specialized for ATC terminology
+- **License:** Same as the base model
 - **Finetuned from model:** meta-llama/Llama-3.2-3B-Instruct
 ## Uses
 ## How to Get Started with the Model
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load the model and tokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "atc_llama_merged",
     torch_dtype="auto",
     device_map="auto"
 )
+tokenizer = AutoTokenizer.from_pretrained("atc_llama_merged")
 # Process an ATC message
 instruction = "As an ATC communication expert, improve this transcript and analyze its intentions and data."
 print(response)
 ```
+## Model Creation Process
+### Base Model and Adapters
+- **Base model:** meta-llama/Llama-3.2-3B-Instruct
+- **Adapter source:** LoRA adapters fine-tuned on ATC communications data
+- **Merge method:** PEFT adapter merging into base model weights
+### Merging Procedure
+The model creation involved:
+1. Loading the base Llama 3B model
+2. Loading LoRA adapters fine-tuned on ATC communications data
+3. Merging the adapters into the base model's weights
+4. Saving the resulting unified model
 ## Evaluation
+### Testing
+The model should be tested on diverse ATC communications, including:
 - Clearances and instructions
 - Pilot requests and reports
 - Emergency communications
 ### Model Architecture and Objective
+- **Base architecture:** meta-llama/Llama-3.2-3B-Instruct
+- **Adaptation method:** LoRA adapters merged into base weights
 - **Training objective:** Improving and analyzing ATC communications
+### Model Card Contact
+For issues or questions about this model, please open a discussion in the repository.

config.json CHANGED Viewed

@@ -5,7 +5,11 @@
   "attention_bias": false,
   "attention_dropout": 0.0,
   "bos_token_id": 128000,
-  "eos_token_id": 128009,
   "head_dim": 128,
   "hidden_act": "silu",
   "hidden_size": 3072,
@@ -31,7 +35,6 @@
   "tie_word_embeddings": true,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.51.3",
-  "unsloth_fixed": true,
   "unsloth_version": "2025.3.19",
   "use_cache": true,
   "vocab_size": 128256

   "attention_bias": false,
   "attention_dropout": 0.0,
   "bos_token_id": 128000,
+  "eos_token_id": [
+    128001,
+    128008,
+    128009
+  ],
   "head_dim": 128,
   "hidden_act": "silu",
   "hidden_size": 3072,
   "tie_word_embeddings": true,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.51.3",
   "unsloth_version": "2025.3.19",
   "use_cache": true,
   "vocab_size": 128256

model-00001-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:16580d3bc53ff9ff866ef10e2e6a989c3a3b956b5bacc03a35dc02f0bf33b482
 size 4965799096

 version https://git-lfs.github.com/spec/v1
+oid sha256:1cee95c58786333e8e640f19307d0cfebc5d5ff7894a65954b6dfd6bb13c4efc
 size 4965799096

model-00002-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fa90810f17d47edd21aa87746d85d2b39adcd7ea9be48ba64b3ec949af115ef8
 size 1459729952

 version https://git-lfs.github.com/spec/v1
+oid sha256:fc6b47057cbb231d759b93d77e6d392e96c21acf3a51aeff2f72dc497f3413bf
 size 1459729952

special_tokens_map.json CHANGED Viewed

@@ -13,11 +13,5 @@
     "rstrip": false,
     "single_word": false
   },
-  "pad_token": {
-    "content": "<|finetune_right_pad_id|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  }
 }

     "rstrip": false,
     "single_word": false
   },
+  "pad_token": "<|finetune_right_pad_id|>"
 }

tokenizer_config.json CHANGED Viewed

@@ -1,5 +1,4 @@
 {
-  "add_bos_token": true,
   "added_tokens_decoder": {
     "128000": {
       "content": "<|begin_of_text|>",
@@ -2062,6 +2061,5 @@
   "model_max_length": 131072,
   "pad_token": "<|finetune_right_pad_id|>",
   "padding_side": "left",
-  "tokenizer_class": "PreTrainedTokenizer",
-  "unk_token": null
 }

 {
   "added_tokens_decoder": {
     "128000": {
       "content": "<|begin_of_text|>",
   "model_max_length": 131072,
   "pad_token": "<|finetune_right_pad_id|>",
   "padding_side": "left",
+  "tokenizer_class": "PreTrainedTokenizer"
 }