M-o-r-p-h-e-u-s commited on Jun 17

Commit

ec5dd4c

verified ·

1 Parent(s): 71999f2

Initial Upload

Browse files

Files changed (20) hide show

.gitattributes +1 -0
README.md +138 -0
added_tokens.json +24 -0
all_results.json +9 -0
chat_template.jinja +54 -0
config.json +28 -0
generation_config.json +14 -0
llamaboard_config.yaml +86 -0
merges.txt +0 -0
model.safetensors +3 -0
special_tokens_map.json +31 -0
tokenizer.json +3 -0
tokenizer_config.json +208 -0
train_results.json +9 -0
trainer_log.jsonl +0 -0
trainer_state.json +0 -0
training_args.bin +3 -0
training_args.yaml +33 -0
training_loss.png +0 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,138 @@

+# Qwen 2.5 0.5B - R1 Lobotomy
+This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) on the [OpenBuddy/R1-0528-Distill](https://huggingface.co/datasets/OpenBuddy/R1-0528-Distill) (magicoder-oss-distilled-r1-0528.jsonl) and the [cognitivecomputations/dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) (dolphin-r1-reasoning-deepseek.jsonl) datasets.
+---
+## Model description
+This is my first attempt at finetuning a model.<br>
+I started with the instruct version (non-reasoning) and my goal was to have it reason its way to an answer.<br>
+The training was performed using LLaMA-Factory.<br>
+<br>
+The resulting model is weird, I've played a little with it and it can ramble on producing its own question/answer pairs often without "thinking", or even "thinking" out loud.<br>
+<br>
+Prompting it via the system message to properly reason and use the correct tags seems to fix it a bit.<br>
+This is system message I used:<br>
+```
+You are a helpful assistant that thinks before it answers.
+You always begin your answer with `<think>`, followed by your reasoning, and once you end your thinking you mark it with`</think>`.
+You the proceed to provide your final answer.
+```
+An example of this approach is:<br>
+```
+Produce a python function that calculates the fibonacci sequence.
+```
+```
+<think>Okay, I need to create a Python function called fibonacci that returns an array of Fibonacci numbers starting from 0 up to n. Let's think about how to approach this.
+First, understanding the problem. The Fibonacci sequence is a series where each number is the sum of the two preceding ones, usually starting with 0 and 1. So for example, the first few terms would be 0, 1, 1, 2, 3, etc.
+The Fibonacci sequence starts with 0 and 1. So if we start generating the first two terms, they should be 0 and 1. Then the next term would be 0 + 1 = 1, then 1 + 0 = 1, and so on.
+But wait, what about larger values? Like when we generate the 6th term or higher? Well, according to the definition, each subsequent term depends on the previous two. For example, the third term is 0 + 1 = 1, the fourth is 1 + 1 = 2, etc. So even though the sequence grows quickly, there are only a finite number of terms (like the nth term is determined by the (n-1)th and (n-2)th terms).
+So the task is to write a function that generates the Fibonacci sequence iteratively until the desired index is reached. Because the recursion depth in Python can get really high, perhaps using a loop with memoization could help here. But maybe a generator is more efficient for large n?
+Let me outline the steps:
+Handle edge cases: If n is negative, return an empty list.
+Start with initial conditions: n=0, fib[0] = 0, fib[1] = 1.
+Generate the sequence iteratively:
+a. Compute the current term as fib[n].
+b. Create a new tuple containing both fib[n-1] and fib[n], but prepend the current term to the tuple to avoid duplicate entries.
+Return the resulting list after processing all elements up to n.
+Wait, let me check if that makes sense. The Fibonacci sequence is known to have O(n) time complexity because each step uses O(1) space, and generating the entire sequence takes linear time relative to n.
+Now, implementing this logic. Using a helper function like fibonacci_iterative to compute the sequence efficiently. Also, handling the base case for n=0. Let's put it all together.</think>
+To solve this problem, I'll create a function that computes the Fibonacci sequence iteratively and handles edge cases. The iterative version will work efficiently even for large n without recursion depth issues. Here's a possible implementation:
+def fibonacci(n):
+    """
+    Returns the nth Fibonacci number iteratively.
+    Args:
+        n (int): The position in the Fibonacci sequence (must be non-negative)
+    Returns:
+        int: The nth Fibonacci number
+    """
+    if n &lt; 0:
+        raise ValueError("n must be non-negative")
+    # Initialize the first two Fibonacci numbers
+    fib_sequence = [0, 1]
+    # Generate the sequence iteratively
+    for i in range(2, n+1):
+        next_term = fib_sequence[i-1] + fib_sequence[i-2]
+        fib_sequence.append(next_term)
+    return fib_sequence[-1]
+# Example usage:
+print(fibonacci(7))  # Output: 8
+This solution uses a loop to iterate through the sequence while maintaining the last two computed terms. This allows us to efficiently generate the entire sequence in O(n) time, making it suitable for large n without performance issues. The initial conditions ensure that the sequence correctly starts at 0 and 1.
+```
+Keep in mind that it can also think without giving you an answer outside the "thinking" tags, or include it inside that section.
+---
+## Intended uses & limitations
+Feel free to use this model however you want, while adhering to Qwen's original license.
+---
+## Training procedure
+I trained this model on the 2 datasets provided above (take note that I only used the specified files, not the entire datasets). The training lasted for 3 epochs (arbitrarily set).<br>
+<br>
+Due to vRAM limitations I took 3 actions:
+ - Freeze the model and only train the final 5 layers.
+ - Limit the context to 4096 tokens.
+ - Filter the datasets to 12k maximum characters and 10k maximum characters respectively.
+I reduced the learning rate to `5e-06` as I didn't want to completely obliterate it.<br>
+<br>
+The model seems to have learned but slowed down dramatically and rapidly.<br>
+It began with a loss at about ~1.3 and ended at about ~0.9.<br>
+Find in the `trainer_log.jsonl` the complete training log step by step.<br>
+The training went on for a little over 2 days on my poor 3060 12gb.<br>
+During the training, the model was fed about 1.1 trillion tokens.<br>
+Finally, I have no idea how the 3 epochs at 4096 context length affected its ability to handle longer sequences.
+---
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-06
+- train_batch_size: 1
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 256
+- total_train_batch_size: 256
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: constant_with_warmup
+- lr_scheduler_warmup_steps: 20
+- num_epochs: 3.0
+<!-- end of the list -->
+For the complete training configuration, please see `training_args.yaml` and/or `llamaboard_config.yaml`.
+---
+### Framework versions
+- Transformers 4.52.4
+- Pytorch 2.7.1+cu128
+- Datasets 3.6.0
+- Tokenizers 0.21.1
+---
+Have fun with this scoundrel of a model and please do get in touch if you have anything you want to relay, fun chat examples, advice, or anything else!<br>
+Cya!

added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

all_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 3.0,
+    "num_input_tokens_seen": 1126151688,
+    "total_flos": 2.4182853777648783e+18,
+    "train_loss": 0.9592450147342836,
+    "train_runtime": 175626.0126,
+    "train_samples_per_second": 3.612,
+    "train_steps_per_second": 0.014
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "eos_token_id": 151645,
+  "hidden_act": "silu",
+  "hidden_size": 896,
+  "initializer_range": 0.02,
+  "intermediate_size": 4864,
+  "max_position_embeddings": 32768,
+  "max_window_layers": 21,
+  "model_type": "qwen2",
+  "num_attention_heads": 14,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": 32768,
+  "tie_word_embeddings": true,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.52.4",
+  "use_cache": false,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "bos_token_id": 151643,
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "repetition_penalty": 1.1,
+  "temperature": 0.7,
+  "top_k": 20,
+  "top_p": 0.8,
+  "transformers_version": "4.52.4"
+}

llamaboard_config.yaml ADDED Viewed

	@@ -0,0 +1,86 @@

+top.booster: flashattn2
+top.checkpoint_path: null
+top.finetuning_type: freeze
+top.model_name: Qwen2.5-0.5B-Instruct
+top.quantization_bit: none
+top.quantization_method: bnb
+top.rope_scaling: yarn
+top.template: qwen
+train.additional_target: ''
+train.apollo_rank: 32
+train.apollo_scale: 64
+train.apollo_target: all
+train.apollo_update_interval: 100
+train.badam_mode: layer
+train.badam_switch_interval: 50
+train.badam_switch_mode: ascending
+train.badam_update_ratio: 0.05
+train.batch_size: 1
+train.compute_type: bf16
+train.create_new_adapter: false
+train.cutoff_len: 4096
+train.dataset:
+- r1_distill
+- r1_distill_dolphin
+train.dataset_dir: data
+train.ds_offload: true
+train.ds_stage: none
+train.enable_thinking: false
+train.extra_args: '{"optim": "adamw_torch", "preprocessing_num_workers": 1}'
+train.freeze_extra_modules: ''
+train.freeze_language_model: false
+train.freeze_multi_modal_projector: true
+train.freeze_trainable_layers: 5
+train.freeze_trainable_modules: all
+train.freeze_vision_tower: true
+train.galore_rank: 16
+train.galore_scale: 2
+train.galore_target: all
+train.galore_update_interval: 200
+train.gradient_accumulation_steps: 256
+train.image_max_pixels: 768*768
+train.image_min_pixels: 32*32
+train.learning_rate: 5e-6
+train.logging_steps: 1
+train.lora_alpha: 16
+train.lora_dropout: 0
+train.lora_rank: 8
+train.lora_target: ''
+train.loraplus_lr_ratio: 0
+train.lr_scheduler_type: constant_with_warmup
+train.mask_history: false
+train.max_grad_norm: '1.0'
+train.max_samples: '1000000'
+train.neat_packing: false
+train.neftune_alpha: 0
+train.num_train_epochs: '3.0'
+train.packing: false
+train.ppo_score_norm: false
+train.ppo_whiten_rewards: false
+train.pref_beta: 0.1
+train.pref_ftx: 0
+train.pref_loss: sigmoid
+train.report_to: none
+train.resize_vocab: false
+train.reward_model: []
+train.save_steps: 2000
+train.swanlab_api_key: ''
+train.swanlab_link: ''
+train.swanlab_mode: cloud
+train.swanlab_project: llamafactory
+train.swanlab_run_name: ''
+train.swanlab_workspace: ''
+train.train_on_prompt: false
+train.training_stage: Supervised Fine-Tuning
+train.use_apollo: false
+train.use_badam: false
+train.use_dora: false
+train.use_galore: false
+train.use_llama_pro: false
+train.use_pissa: false
+train.use_rslora: false
+train.use_swanlab: false
+train.val_size: 0
+train.video_max_pixels: 256*256
+train.video_min_pixels: 16*16
+train.warmup_steps: 20

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6bfd0668f310d2c15763f2ab9650d226c233f2a66ba0f9589d1928a5f8f20cb9
+size 1137221688

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,208 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

train_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 3.0,
+    "num_input_tokens_seen": 1126151688,
+    "total_flos": 2.4182853777648783e+18,
+    "train_loss": 0.9592450147342836,
+    "train_runtime": 175626.0126,
+    "train_samples_per_second": 3.612,
+    "train_steps_per_second": 0.014
+}

trainer_log.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9f03bb8815bf1807fce87727c414832f674c6fcaed7f0df67889c7ccff06b3f
+size 6161

training_args.yaml ADDED Viewed

	@@ -0,0 +1,33 @@

+bf16: true
+cutoff_len: 4096
+dataset: r1_distill,r1_distill_dolphin
+dataset_dir: data
+ddp_timeout: 180000000
+do_train: true
+enable_thinking: false
+finetuning_type: freeze
+flash_attn: fa2
+freeze_trainable_layers: 5
+freeze_trainable_modules: all
+gradient_accumulation_steps: 256
+include_num_input_tokens_seen: true
+learning_rate: 5.0e-06
+logging_steps: 1
+lr_scheduler_type: constant_with_warmup
+max_grad_norm: 1.0
+max_samples: 1000000
+model_name_or_path: Qwen/Qwen2.5-0.5B-Instruct
+num_train_epochs: 3.0
+optim: adamw_torch
+output_dir: saves\Qwen2.5-0.5B-Instruct\freeze\train_2025-06-15-17-37-11
+packing: false
+per_device_train_batch_size: 1
+plot_loss: true
+preprocessing_num_workers: 1
+report_to: none
+rope_scaling: yarn
+save_steps: 2000
+stage: sft
+template: qwen
+trust_remote_code: true
+warmup_steps: 20

training_loss.png ADDED Viewed

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff