Qwen 2.5 0.5B - R1 Lobotomy

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct on the OpenBuddy/R1-0528-Distill (magicoder-oss-distilled-r1-0528.jsonl) and the cognitivecomputations/dolphin-r1 (dolphin-r1-reasoning-deepseek.jsonl) datasets.

Model description

This is my first attempt at finetuning a model.
I started with the instruct version (non-reasoning) and my goal was to have it reason its way to an answer.
The training was performed using LLaMA-Factory.

The resulting model is weird, I've played a little with it and it can ramble on producing its own question/answer pairs often without "thinking", or even "thinking" out loud.

Prompting it via the system message to properly reason and use the correct tags seems to fix it a bit.
This is system message I used:

You are a helpful assistant that thinks before it answers.
You always begin your answer with `<think>`, followed by your reasoning, and once you end your thinking you mark it with`</think>`.
You the proceed to provide your final answer.

An example of this approach is:

Produce a python function that calculates the fibonacci sequence.

<think>Okay, I need to create a Python function called fibonacci that returns an array of Fibonacci numbers starting from 0 up to n. Let's think about how to approach this.

First, understanding the problem. The Fibonacci sequence is a series where each number is the sum of the two preceding ones, usually starting with 0 and 1. So for example, the first few terms would be 0, 1, 1, 2, 3, etc.

The Fibonacci sequence starts with 0 and 1. So if we start generating the first two terms, they should be 0 and 1. Then the next term would be 0 + 1 = 1, then 1 + 0 = 1, and so on.

But wait, what about larger values? Like when we generate the 6th term or higher? Well, according to the definition, each subsequent term depends on the previous two. For example, the third term is 0 + 1 = 1, the fourth is 1 + 1 = 2, etc. So even though the sequence grows quickly, there are only a finite number of terms (like the nth term is determined by the (n-1)th and (n-2)th terms).

So the task is to write a function that generates the Fibonacci sequence iteratively until the desired index is reached. Because the recursion depth in Python can get really high, perhaps using a loop with memoization could help here. But maybe a generator is more efficient for large n?

Let me outline the steps:

Handle edge cases: If n is negative, return an empty list.
Start with initial conditions: n=0, fib[0] = 0, fib[1] = 1.
Generate the sequence iteratively:
a. Compute the current term as fib[n].
b. Create a new tuple containing both fib[n-1] and fib[n], but prepend the current term to the tuple to avoid duplicate entries.
Return the resulting list after processing all elements up to n.
Wait, let me check if that makes sense. The Fibonacci sequence is known to have O(n) time complexity because each step uses O(1) space, and generating the entire sequence takes linear time relative to n.

Now, implementing this logic. Using a helper function like fibonacci_iterative to compute the sequence efficiently. Also, handling the base case for n=0. Let's put it all together.</think>

To solve this problem, I'll create a function that computes the Fibonacci sequence iteratively and handles edge cases. The iterative version will work efficiently even for large n without recursion depth issues. Here's a possible implementation:


def fibonacci(n):
    """
    Returns the nth Fibonacci number iteratively.

    Args:
        n (int): The position in the Fibonacci sequence (must be non-negative)

    Returns:
        int: The nth Fibonacci number
    """
    if n &lt; 0:
        raise ValueError("n must be non-negative")
    
    # Initialize the first two Fibonacci numbers
    fib_sequence = [0, 1]
    
    # Generate the sequence iteratively
    for i in range(2, n+1):
        next_term = fib_sequence[i-1] + fib_sequence[i-2]
        fib_sequence.append(next_term)
    
    return fib_sequence[-1]

# Example usage:
print(fibonacci(7))  # Output: 8
This solution uses a loop to iterate through the sequence while maintaining the last two computed terms. This allows us to efficiently generate the entire sequence in O(n) time, making it suitable for large n without performance issues. The initial conditions ensure that the sequence correctly starts at 0 and 1.

Keep in mind that it can also think without giving you an answer outside the "thinking" tags, or include it inside that section.

Intended uses & limitations

Feel free to use this model however you want, while adhering to Qwen's original license.

Training procedure

I trained this model on the 2 datasets provided above (take note that I only used the specified files, not the entire datasets). The training lasted for 3 epochs (arbitrarily set).

Due to vRAM limitations I took 3 actions:

Freeze the model and only train the final 5 layers.
Limit the context to 4096 tokens.
Filter the datasets to 12k maximum characters and 10k maximum characters respectively.

I reduced the learning rate to 5e-06 as I didn't want to completely obliterate it.

The model seems to have learned but slowed down dramatically and rapidly.
It began with a loss at about ~1.3 and ended at about ~0.9.
Find in the trainer_log.jsonl the complete training log step by step.
The training went on for a little over 2 days on my poor 3060 12gb.
During the training, the model was fed about 1.1 trillion tokens.
Finally, I have no idea how the 3 epochs at 4096 context length affected its ability to handle longer sequences.

Loss progression across 3 epochs:

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 256
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_steps: 20
num_epochs: 3.0 For the complete training configuration, please see training_args.yaml and/or llamaboard_config.yaml.

Framework versions

Transformers 4.52.4
Pytorch 2.7.1+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Have fun with this scoundrel of a model and please do get in touch if you have anything you want to relay, fun chat examples, advice, or anything else!
Cya!

M-o-r-p-h-e-u-s
/

Qwen2.5-0.5B-Instruct-R1-Lobotomy