Sang-Buster commited on
Commit
233fb54
·
verified ·
1 Parent(s): c669b9a

feat: upload model files

Browse files
README.md CHANGED
@@ -4,33 +4,35 @@ language:
4
  - en
5
  base_model:
6
  - meta-llama/Llama-3.2-3B-Instruct
7
- pipeline_tag: text-classification
8
  tags:
9
  - Speech Recognition
10
  - ATC
 
 
11
  ---
12
 
13
- # ATC Communication Expert Model
14
 
15
- A fine-tuned model specialized in improving and analyzing Air Traffic Control (ATC) communications, extracting relevant information from raw transcripts.
16
 
17
  ## Model Details
18
 
19
  ### Model Description
20
 
21
- This model is a fine-tuned version of Llama-3.2-3B-Instruct optimized for processing Air Traffic Control communications. It can:
22
 
23
  - Improve raw ATC transcripts with proper punctuation and formatting
24
  - Identify communication intentions (pilot requests, ATC instructions, etc.)
25
  - Extract key information such as flight numbers, altitudes, headings, and other numerical data
26
  - Analyze speaker roles and communication patterns
27
 
28
- The model was fine-tuned using LoRA (Low-Rank Adaptation) with PEFT (Parameter-Efficient Fine-Tuning) techniques to efficiently adapt the Llama 3.2 model to this specialized domain.
29
 
30
  - **Developed by:** ATC NLP Team
31
- - **Model type:** Fine-tuned Llama 3.2 with LoRA adapters
32
  - **Language(s):** English, specialized for ATC terminology
33
- - **License:** Same as the base model (Llama 3.2)
34
  - **Finetuned from model:** meta-llama/Llama-3.2-3B-Instruct
35
 
36
  ## Uses
@@ -80,21 +82,15 @@ This model is not suitable for:
80
  ## How to Get Started with the Model
81
 
82
  ```python
83
- from peft import PeftModel
84
  from transformers import AutoModelForCausalLM, AutoTokenizer
85
 
86
- # Load the model with LoRA adapters
87
- base_model = AutoModelForCausalLM.from_pretrained(
88
- "meta-llama/Llama-3.2-3B-Instruct",
89
  torch_dtype="auto",
90
  device_map="auto"
91
  )
92
- tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
93
- model = PeftModel.from_pretrained(base_model, "path_to_adapters")
94
-
95
- # Alternatively, use the merged model if available
96
- # model = AutoModelForCausalLM.from_pretrained("path_to_merged_model")
97
- # tokenizer = AutoTokenizer.from_pretrained("path_to_merged_model")
98
 
99
  # Process an ATC message
100
  instruction = "As an ATC communication expert, improve this transcript and analyze its intentions and data."
@@ -109,45 +105,27 @@ response = tokenizer.decode(outputs[0, inputs["input_ids"].shape[1]:], skip_spec
109
  print(response)
110
  ```
111
 
112
- ## Training Details
113
-
114
- ### Training Data
115
 
116
- The model was trained on a dataset of ATC communications with:
117
- - Original raw transcripts
118
- - Properly punctuated and formatted versions
119
- - Annotated intentions (PSC, PSR, PRP, PRQ, PRB, PAC, ASC, AGI, ACR, END)
120
- - Extracted numerical data (altitudes, headings, flight numbers, etc.)
121
- - Speaker and listener information
122
 
123
- ### Training Procedure
124
-
125
- The model was fine-tuned using LoRA with the following approach:
126
- - Parameter-efficient fine-tuning using PEFT
127
- - LoRA applied to key attention layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
128
- - Optimized with Unsloth for efficiency
129
 
130
- #### Training Hyperparameters
131
 
132
- - **Base model:** meta-llama/Llama-3.2-3B-Instruct
133
- - **LoRA rank:** 16
134
- - **LoRA alpha:** 16
135
- - **Learning rate:** 2e-4
136
- - **Batch size:** 4
137
- - **Gradient accumulation steps:** 4
138
- - **Epochs:** 3
139
- - **Warmup ratio:** 0.03
140
- - **Max sequence length:** 2048
141
- - **Training regime:** BF16 mixed precision where available, FP16 otherwise
142
- - **Optimizer:** AdamW 8-bit
143
 
144
  ## Evaluation
145
 
146
- ### Testing Data, Factors & Metrics
147
-
148
- #### Testing Data
149
 
150
- The model was tested on a diverse set of ATC communications, including:
151
  - Clearances and instructions
152
  - Pilot requests and reports
153
  - Emergency communications
@@ -157,19 +135,10 @@ The model was tested on a diverse set of ATC communications, including:
157
 
158
  ### Model Architecture and Objective
159
 
160
- - **Base architecture:** Llama-3.2-3B-Instruct
161
- - **Fine-tuning method:** LoRA with PEFT
162
- - **Optimization library:** Unsloth
163
  - **Training objective:** Improving and analyzing ATC communications
164
 
165
- ### Compute Infrastructure
166
-
167
- - **Framework versions:**
168
- - PEFT 0.15.2
169
- - Unsloth (latest version used during training)
170
- - Transformers (compatible with the base model)
171
- - PyTorch (with BF16 support where available)
172
-
173
- ## Model Card Contact
174
 
175
- For issues or questions about this model, please open an discussion in the repository.
 
4
  - en
5
  base_model:
6
  - meta-llama/Llama-3.2-3B-Instruct
7
+ pipeline_tag: text-generation
8
  tags:
9
  - Speech Recognition
10
  - ATC
11
+ - Unsloth
12
+ - LoRA-Merged
13
  ---
14
 
15
+ # ATC Communication Expert Model (Merged)
16
 
17
+ A fine-tuned model specialized in improving and analyzing Air Traffic Control (ATC) communications, with LoRA adapters merged into the base model.
18
 
19
  ## Model Details
20
 
21
  ### Model Description
22
 
23
+ This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct with merged LoRA adapters, optimized for processing Air Traffic Control communications. It can:
24
 
25
  - Improve raw ATC transcripts with proper punctuation and formatting
26
  - Identify communication intentions (pilot requests, ATC instructions, etc.)
27
  - Extract key information such as flight numbers, altitudes, headings, and other numerical data
28
  - Analyze speaker roles and communication patterns
29
 
30
+ The model was created by merging LoRA adapters (fine-tuned on ATC communications) into the Llama 3B base model, creating a unified model optimized for this specialized domain.
31
 
32
  - **Developed by:** ATC NLP Team
33
+ - **Model type:** Llama 3B with merged LoRA adapters
34
  - **Language(s):** English, specialized for ATC terminology
35
+ - **License:** Same as the base model
36
  - **Finetuned from model:** meta-llama/Llama-3.2-3B-Instruct
37
 
38
  ## Uses
 
82
  ## How to Get Started with the Model
83
 
84
  ```python
 
85
  from transformers import AutoModelForCausalLM, AutoTokenizer
86
 
87
+ # Load the model and tokenizer
88
+ model = AutoModelForCausalLM.from_pretrained(
89
+ "atc_llama_merged",
90
  torch_dtype="auto",
91
  device_map="auto"
92
  )
93
+ tokenizer = AutoTokenizer.from_pretrained("atc_llama_merged")
 
 
 
 
 
94
 
95
  # Process an ATC message
96
  instruction = "As an ATC communication expert, improve this transcript and analyze its intentions and data."
 
105
  print(response)
106
  ```
107
 
108
+ ## Model Creation Process
 
 
109
 
110
+ ### Base Model and Adapters
 
 
 
 
 
111
 
112
+ - **Base model:** meta-llama/Llama-3.2-3B-Instruct
113
+ - **Adapter source:** LoRA adapters fine-tuned on ATC communications data
114
+ - **Merge method:** PEFT adapter merging into base model weights
 
 
 
115
 
116
+ ### Merging Procedure
117
 
118
+ The model creation involved:
119
+ 1. Loading the base Llama 3B model
120
+ 2. Loading LoRA adapters fine-tuned on ATC communications data
121
+ 3. Merging the adapters into the base model's weights
122
+ 4. Saving the resulting unified model
 
 
 
 
 
 
123
 
124
  ## Evaluation
125
 
126
+ ### Testing
 
 
127
 
128
+ The model should be tested on diverse ATC communications, including:
129
  - Clearances and instructions
130
  - Pilot requests and reports
131
  - Emergency communications
 
135
 
136
  ### Model Architecture and Objective
137
 
138
+ - **Base architecture:** meta-llama/Llama-3.2-3B-Instruct
139
+ - **Adaptation method:** LoRA adapters merged into base weights
 
140
  - **Training objective:** Improving and analyzing ATC communications
141
 
142
+ ### Model Card Contact
 
 
 
 
 
 
 
 
143
 
144
+ For issues or questions about this model, please open a discussion in the repository.
config.json CHANGED
@@ -5,7 +5,11 @@
5
  "attention_bias": false,
6
  "attention_dropout": 0.0,
7
  "bos_token_id": 128000,
8
- "eos_token_id": 128009,
 
 
 
 
9
  "head_dim": 128,
10
  "hidden_act": "silu",
11
  "hidden_size": 3072,
@@ -31,7 +35,6 @@
31
  "tie_word_embeddings": true,
32
  "torch_dtype": "bfloat16",
33
  "transformers_version": "4.51.3",
34
- "unsloth_fixed": true,
35
  "unsloth_version": "2025.3.19",
36
  "use_cache": true,
37
  "vocab_size": 128256
 
5
  "attention_bias": false,
6
  "attention_dropout": 0.0,
7
  "bos_token_id": 128000,
8
+ "eos_token_id": [
9
+ 128001,
10
+ 128008,
11
+ 128009
12
+ ],
13
  "head_dim": 128,
14
  "hidden_act": "silu",
15
  "hidden_size": 3072,
 
35
  "tie_word_embeddings": true,
36
  "torch_dtype": "bfloat16",
37
  "transformers_version": "4.51.3",
 
38
  "unsloth_version": "2025.3.19",
39
  "use_cache": true,
40
  "vocab_size": 128256
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:16580d3bc53ff9ff866ef10e2e6a989c3a3b956b5bacc03a35dc02f0bf33b482
3
  size 4965799096
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1cee95c58786333e8e640f19307d0cfebc5d5ff7894a65954b6dfd6bb13c4efc
3
  size 4965799096
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fa90810f17d47edd21aa87746d85d2b39adcd7ea9be48ba64b3ec949af115ef8
3
  size 1459729952
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc6b47057cbb231d759b93d77e6d392e96c21acf3a51aeff2f72dc497f3413bf
3
  size 1459729952
special_tokens_map.json CHANGED
@@ -13,11 +13,5 @@
13
  "rstrip": false,
14
  "single_word": false
15
  },
16
- "pad_token": {
17
- "content": "<|finetune_right_pad_id|>",
18
- "lstrip": false,
19
- "normalized": false,
20
- "rstrip": false,
21
- "single_word": false
22
- }
23
  }
 
13
  "rstrip": false,
14
  "single_word": false
15
  },
16
+ "pad_token": "<|finetune_right_pad_id|>"
 
 
 
 
 
 
17
  }
tokenizer_config.json CHANGED
@@ -1,5 +1,4 @@
1
  {
2
- "add_bos_token": true,
3
  "added_tokens_decoder": {
4
  "128000": {
5
  "content": "<|begin_of_text|>",
@@ -2062,6 +2061,5 @@
2062
  "model_max_length": 131072,
2063
  "pad_token": "<|finetune_right_pad_id|>",
2064
  "padding_side": "left",
2065
- "tokenizer_class": "PreTrainedTokenizer",
2066
- "unk_token": null
2067
  }
 
1
  {
 
2
  "added_tokens_decoder": {
3
  "128000": {
4
  "content": "<|begin_of_text|>",
 
2061
  "model_max_length": 131072,
2062
  "pad_token": "<|finetune_right_pad_id|>",
2063
  "padding_side": "left",
2064
+ "tokenizer_class": "PreTrainedTokenizer"
 
2065
  }