Safetensors
English
qwen2
M-o-r-p-h-e-u-s commited on
Commit
ec5dd4c
·
verified ·
1 Parent(s): 71999f2

Initial Upload

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Qwen 2.5 0.5B - R1 Lobotomy
2
+
3
+ This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) on the [OpenBuddy/R1-0528-Distill](https://huggingface.co/datasets/OpenBuddy/R1-0528-Distill) (magicoder-oss-distilled-r1-0528.jsonl) and the [cognitivecomputations/dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) (dolphin-r1-reasoning-deepseek.jsonl) datasets.
4
+
5
+ ---
6
+
7
+ ## Model description
8
+ This is my first attempt at finetuning a model.<br>
9
+ I started with the instruct version (non-reasoning) and my goal was to have it reason its way to an answer.<br>
10
+ The training was performed using LLaMA-Factory.<br>
11
+ <br>
12
+ The resulting model is weird, I've played a little with it and it can ramble on producing its own question/answer pairs often without "thinking", or even "thinking" out loud.<br>
13
+ <br>
14
+ Prompting it via the system message to properly reason and use the correct tags seems to fix it a bit.<br>
15
+ This is system message I used:<br>
16
+ ```
17
+ You are a helpful assistant that thinks before it answers.
18
+ You always begin your answer with `<think>`, followed by your reasoning, and once you end your thinking you mark it with`</think>`.
19
+ You the proceed to provide your final answer.
20
+ ```
21
+
22
+ An example of this approach is:<br>
23
+ ```
24
+ Produce a python function that calculates the fibonacci sequence.
25
+ ```
26
+ ```
27
+ <think>Okay, I need to create a Python function called fibonacci that returns an array of Fibonacci numbers starting from 0 up to n. Let's think about how to approach this.
28
+
29
+ First, understanding the problem. The Fibonacci sequence is a series where each number is the sum of the two preceding ones, usually starting with 0 and 1. So for example, the first few terms would be 0, 1, 1, 2, 3, etc.
30
+
31
+ The Fibonacci sequence starts with 0 and 1. So if we start generating the first two terms, they should be 0 and 1. Then the next term would be 0 + 1 = 1, then 1 + 0 = 1, and so on.
32
+
33
+ But wait, what about larger values? Like when we generate the 6th term or higher? Well, according to the definition, each subsequent term depends on the previous two. For example, the third term is 0 + 1 = 1, the fourth is 1 + 1 = 2, etc. So even though the sequence grows quickly, there are only a finite number of terms (like the nth term is determined by the (n-1)th and (n-2)th terms).
34
+
35
+ So the task is to write a function that generates the Fibonacci sequence iteratively until the desired index is reached. Because the recursion depth in Python can get really high, perhaps using a loop with memoization could help here. But maybe a generator is more efficient for large n?
36
+
37
+ Let me outline the steps:
38
+
39
+ Handle edge cases: If n is negative, return an empty list.
40
+ Start with initial conditions: n=0, fib[0] = 0, fib[1] = 1.
41
+ Generate the sequence iteratively:
42
+ a. Compute the current term as fib[n].
43
+ b. Create a new tuple containing both fib[n-1] and fib[n], but prepend the current term to the tuple to avoid duplicate entries.
44
+ Return the resulting list after processing all elements up to n.
45
+ Wait, let me check if that makes sense. The Fibonacci sequence is known to have O(n) time complexity because each step uses O(1) space, and generating the entire sequence takes linear time relative to n.
46
+
47
+ Now, implementing this logic. Using a helper function like fibonacci_iterative to compute the sequence efficiently. Also, handling the base case for n=0. Let's put it all together.</think>
48
+
49
+ To solve this problem, I'll create a function that computes the Fibonacci sequence iteratively and handles edge cases. The iterative version will work efficiently even for large n without recursion depth issues. Here's a possible implementation:
50
+
51
+
52
+ def fibonacci(n):
53
+ """
54
+ Returns the nth Fibonacci number iteratively.
55
+
56
+ Args:
57
+ n (int): The position in the Fibonacci sequence (must be non-negative)
58
+
59
+ Returns:
60
+ int: The nth Fibonacci number
61
+ """
62
+ if n &lt; 0:
63
+ raise ValueError("n must be non-negative")
64
+
65
+ # Initialize the first two Fibonacci numbers
66
+ fib_sequence = [0, 1]
67
+
68
+ # Generate the sequence iteratively
69
+ for i in range(2, n+1):
70
+ next_term = fib_sequence[i-1] + fib_sequence[i-2]
71
+ fib_sequence.append(next_term)
72
+
73
+ return fib_sequence[-1]
74
+
75
+ # Example usage:
76
+ print(fibonacci(7)) # Output: 8
77
+ This solution uses a loop to iterate through the sequence while maintaining the last two computed terms. This allows us to efficiently generate the entire sequence in O(n) time, making it suitable for large n without performance issues. The initial conditions ensure that the sequence correctly starts at 0 and 1.
78
+ ```
79
+
80
+ Keep in mind that it can also think without giving you an answer outside the "thinking" tags, or include it inside that section.
81
+
82
+ ---
83
+
84
+ ## Intended uses & limitations
85
+
86
+ Feel free to use this model however you want, while adhering to Qwen's original license.
87
+
88
+ ---
89
+
90
+ ## Training procedure
91
+ I trained this model on the 2 datasets provided above (take note that I only used the specified files, not the entire datasets). The training lasted for 3 epochs (arbitrarily set).<br>
92
+ <br>
93
+ Due to vRAM limitations I took 3 actions:
94
+ - Freeze the model and only train the final 5 layers.
95
+ - Limit the context to 4096 tokens.
96
+ - Filter the datasets to 12k maximum characters and 10k maximum characters respectively.
97
+
98
+ I reduced the learning rate to `5e-06` as I didn't want to completely obliterate it.<br>
99
+ <br>
100
+ The model seems to have learned but slowed down dramatically and rapidly.<br>
101
+ It began with a loss at about ~1.3 and ended at about ~0.9.<br>
102
+ Find in the `trainer_log.jsonl` the complete training log step by step.<br>
103
+ The training went on for a little over 2 days on my poor 3060 12gb.<br>
104
+ During the training, the model was fed about 1.1 trillion tokens.<br>
105
+ Finally, I have no idea how the 3 epochs at 4096 context length affected its ability to handle longer sequences.
106
+
107
+ ---
108
+
109
+ ### Training hyperparameters
110
+
111
+ The following hyperparameters were used during training:
112
+ - learning_rate: 5e-06
113
+ - train_batch_size: 1
114
+ - eval_batch_size: 8
115
+ - seed: 42
116
+ - gradient_accumulation_steps: 256
117
+ - total_train_batch_size: 256
118
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
119
+ - lr_scheduler_type: constant_with_warmup
120
+ - lr_scheduler_warmup_steps: 20
121
+ - num_epochs: 3.0
122
+ <!-- end of the list -->
123
+ For the complete training configuration, please see `training_args.yaml` and/or `llamaboard_config.yaml`.
124
+
125
+ ---
126
+
127
+ ### Framework versions
128
+
129
+ - Transformers 4.52.4
130
+ - Pytorch 2.7.1+cu128
131
+ - Datasets 3.6.0
132
+ - Tokenizers 0.21.1
133
+
134
+
135
+ ---
136
+
137
+ Have fun with this scoundrel of a model and please do get in touch if you have anything you want to relay, fun chat examples, advice, or anything else!<br>
138
+ Cya!
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "num_input_tokens_seen": 1126151688,
4
+ "total_flos": 2.4182853777648783e+18,
5
+ "train_loss": 0.9592450147342836,
6
+ "train_runtime": 175626.0126,
7
+ "train_samples_per_second": 3.612,
8
+ "train_steps_per_second": 0.014
9
+ }
chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 151643,
7
+ "eos_token_id": 151645,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 896,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 4864,
12
+ "max_position_embeddings": 32768,
13
+ "max_window_layers": 21,
14
+ "model_type": "qwen2",
15
+ "num_attention_heads": 14,
16
+ "num_hidden_layers": 24,
17
+ "num_key_value_heads": 2,
18
+ "rms_norm_eps": 1e-06,
19
+ "rope_scaling": null,
20
+ "rope_theta": 1000000.0,
21
+ "sliding_window": 32768,
22
+ "tie_word_embeddings": true,
23
+ "torch_dtype": "bfloat16",
24
+ "transformers_version": "4.52.4",
25
+ "use_cache": false,
26
+ "use_sliding_window": false,
27
+ "vocab_size": 151936
28
+ }
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "repetition_penalty": 1.1,
10
+ "temperature": 0.7,
11
+ "top_k": 20,
12
+ "top_p": 0.8,
13
+ "transformers_version": "4.52.4"
14
+ }
llamaboard_config.yaml ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ top.booster: flashattn2
2
+ top.checkpoint_path: null
3
+ top.finetuning_type: freeze
4
+ top.model_name: Qwen2.5-0.5B-Instruct
5
+ top.quantization_bit: none
6
+ top.quantization_method: bnb
7
+ top.rope_scaling: yarn
8
+ top.template: qwen
9
+ train.additional_target: ''
10
+ train.apollo_rank: 32
11
+ train.apollo_scale: 64
12
+ train.apollo_target: all
13
+ train.apollo_update_interval: 100
14
+ train.badam_mode: layer
15
+ train.badam_switch_interval: 50
16
+ train.badam_switch_mode: ascending
17
+ train.badam_update_ratio: 0.05
18
+ train.batch_size: 1
19
+ train.compute_type: bf16
20
+ train.create_new_adapter: false
21
+ train.cutoff_len: 4096
22
+ train.dataset:
23
+ - r1_distill
24
+ - r1_distill_dolphin
25
+ train.dataset_dir: data
26
+ train.ds_offload: true
27
+ train.ds_stage: none
28
+ train.enable_thinking: false
29
+ train.extra_args: '{"optim": "adamw_torch", "preprocessing_num_workers": 1}'
30
+ train.freeze_extra_modules: ''
31
+ train.freeze_language_model: false
32
+ train.freeze_multi_modal_projector: true
33
+ train.freeze_trainable_layers: 5
34
+ train.freeze_trainable_modules: all
35
+ train.freeze_vision_tower: true
36
+ train.galore_rank: 16
37
+ train.galore_scale: 2
38
+ train.galore_target: all
39
+ train.galore_update_interval: 200
40
+ train.gradient_accumulation_steps: 256
41
+ train.image_max_pixels: 768*768
42
+ train.image_min_pixels: 32*32
43
+ train.learning_rate: 5e-6
44
+ train.logging_steps: 1
45
+ train.lora_alpha: 16
46
+ train.lora_dropout: 0
47
+ train.lora_rank: 8
48
+ train.lora_target: ''
49
+ train.loraplus_lr_ratio: 0
50
+ train.lr_scheduler_type: constant_with_warmup
51
+ train.mask_history: false
52
+ train.max_grad_norm: '1.0'
53
+ train.max_samples: '1000000'
54
+ train.neat_packing: false
55
+ train.neftune_alpha: 0
56
+ train.num_train_epochs: '3.0'
57
+ train.packing: false
58
+ train.ppo_score_norm: false
59
+ train.ppo_whiten_rewards: false
60
+ train.pref_beta: 0.1
61
+ train.pref_ftx: 0
62
+ train.pref_loss: sigmoid
63
+ train.report_to: none
64
+ train.resize_vocab: false
65
+ train.reward_model: []
66
+ train.save_steps: 2000
67
+ train.swanlab_api_key: ''
68
+ train.swanlab_link: ''
69
+ train.swanlab_mode: cloud
70
+ train.swanlab_project: llamafactory
71
+ train.swanlab_run_name: ''
72
+ train.swanlab_workspace: ''
73
+ train.train_on_prompt: false
74
+ train.training_stage: Supervised Fine-Tuning
75
+ train.use_apollo: false
76
+ train.use_badam: false
77
+ train.use_dora: false
78
+ train.use_galore: false
79
+ train.use_llama_pro: false
80
+ train.use_pissa: false
81
+ train.use_rslora: false
82
+ train.use_swanlab: false
83
+ train.val_size: 0
84
+ train.video_max_pixels: 256*256
85
+ train.video_min_pixels: 16*16
86
+ train.warmup_steps: 20
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bfd0668f310d2c15763f2ab9650d226c233f2a66ba0f9589d1928a5f8f20cb9
3
+ size 1137221688
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
tokenizer_config.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "padding_side": "right",
205
+ "split_special_tokens": false,
206
+ "tokenizer_class": "Qwen2Tokenizer",
207
+ "unk_token": null
208
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "num_input_tokens_seen": 1126151688,
4
+ "total_flos": 2.4182853777648783e+18,
5
+ "train_loss": 0.9592450147342836,
6
+ "train_runtime": 175626.0126,
7
+ "train_samples_per_second": 3.612,
8
+ "train_steps_per_second": 0.014
9
+ }
trainer_log.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9f03bb8815bf1807fce87727c414832f674c6fcaed7f0df67889c7ccff06b3f
3
+ size 6161
training_args.yaml ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ bf16: true
2
+ cutoff_len: 4096
3
+ dataset: r1_distill,r1_distill_dolphin
4
+ dataset_dir: data
5
+ ddp_timeout: 180000000
6
+ do_train: true
7
+ enable_thinking: false
8
+ finetuning_type: freeze
9
+ flash_attn: fa2
10
+ freeze_trainable_layers: 5
11
+ freeze_trainable_modules: all
12
+ gradient_accumulation_steps: 256
13
+ include_num_input_tokens_seen: true
14
+ learning_rate: 5.0e-06
15
+ logging_steps: 1
16
+ lr_scheduler_type: constant_with_warmup
17
+ max_grad_norm: 1.0
18
+ max_samples: 1000000
19
+ model_name_or_path: Qwen/Qwen2.5-0.5B-Instruct
20
+ num_train_epochs: 3.0
21
+ optim: adamw_torch
22
+ output_dir: saves\Qwen2.5-0.5B-Instruct\freeze\train_2025-06-15-17-37-11
23
+ packing: false
24
+ per_device_train_batch_size: 1
25
+ plot_loss: true
26
+ preprocessing_num_workers: 1
27
+ report_to: none
28
+ rope_scaling: yarn
29
+ save_steps: 2000
30
+ stage: sft
31
+ template: qwen
32
+ trust_remote_code: true
33
+ warmup_steps: 20
training_loss.png ADDED
vocab.json ADDED
The diff for this file is too large to render. See raw diff