File size: 4,705 Bytes
a0b36dc
97e0c62
9492a8c
a0b36dc
 
 
 
 
97e0c62
 
 
a0b36dc
 
 
97e0c62
a0b36dc
 
5fa0c2d
a0b36dc
5fa0c2d
97e0c62
9492a8c
97e0c62
a0b36dc
97e0c62
a0b36dc
9492a8c
97e0c62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5fa0c2d
97e0c62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5fa0c2d
97e0c62
 
 
 
 
755861c
97e0c62
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
base_model:
- saishshinde15/Clyrai_Base_Reasoning
tags:
- text-generation-inference
- transformers
- qwen2
- trl
- reasoning
- deepseekR1
- advanced-finetuning
license: apache-2.0
language:
- en
pipeline_tag: text-generation
---

# Clyrai Vortex Reasoning  

- **Developed by:** clyrai 
- **License:** apache-2.0  
- **Fine-tuned from:** [saishshinde15/Clyrai_Base_Reasoning](https://huggingface.co/saishshinde15/TethysAI_Base_Reasoning)  
- **Category:** Experimental, Research  

## **Introduction**  

TethysAI Vortex Reasoning is an **experimental model** that advances the structured reasoning capabilities pioneered by [Clyrai_Base Reasoning](https://huggingface.co/saishshinde15/TethysAI_Base_Reasoning). While the Base Reasoning model utilized **Generalized Reinforced Policy Optimization (GRPO)** to enhance step-by-step logical thought processes similar to **DeepSeek-R1**, this model takes a different approach—**eliminating GRPO and instead relying on high-end Supervised Fine-Tuning (SFT) techniques**.  

The core objective was to investigate whether **deep reasoning and self-questioning behavior could emerge purely through SFT on high-quality datasets**. The results were highly promising: the model successfully **questions itself internally**, improves reasoning depth, and consistently generates structured, logical responses.  

---

## **Key Features**  

### **1️⃣ Advanced Reasoning Without GRPO**  
This model **does not rely on GRPO** yet **achieves similar self-reflective thought processes**, proving that structured reasoning can be induced through **high-quality SFT alone**.  

### **2️⃣ Self-Questioning and Iterative Thinking**  
The model **actively asks itself intermediate questions before answering**, mimicking the deep **reflection-based thought process** of models like DeepSeek-R1. This leads to **more reliable** and **well-structured** responses.  

### **3️⃣ High-Quality SFT on a Curated Dataset**  
To compensate for the lack of reinforcement learning, we used an **extensive dataset** tailored for deep reasoning. This dataset includes:  
- **Mathematical proofs & logical puzzles**  
- **Complex multi-step problem-solving tasks**  
- **Philosophical and ethical reasoning**  
- **Scientific hypothesis evaluation**  

### **4️⃣ Implicit Use of `<think>` and `<answer>` Tokens**  
The model internally uses **special reasoning markers** (`<think>` and `<answer>`) to structure its responses, though these may not always be visible in the final output. This ensures a **consistent and methodical approach** to answering questions.  

### **5️⃣ Part of the TethysAI Vortex Family**  
This model belongs to the **Clyrai Vortex series**, a collection of fine-tuned models pushing the boundaries of **SFT-based reasoning without reinforcement learning**.  

---

## **Breakthrough Insights**  

| Feature                          | Base Reasoning (GRPO) ✅ | Vortex Reasoning (SFT-Only) ✅ |
|----------------------------------|------------------------|----------------------------|
| Structured Thought Process      | ✅ Yes (GRPO)         | ✅ Yes (SFT)              |
| Self-Reflection & Questioning    | ✅ Strong             | ✅ Equally Strong        |
| GRPO-Free Optimization          | ❌ No                  | ✅ Achieved via SFT       |
| Step-by-Step Problem Solving    | ✅ Yes                 | ✅ Yes                    |
| Use of `<think>` and `<answer>`  | ✅ Explicit           | ✅ Implicit (Internal Use) |

**Key Takeaway:** This experiment confirms that **reinforcement learning is not the only pathway to advanced reasoning capabilities**—with the right dataset and SFT strategies, models can **self-reflect and logically deduce answers** in a structured manner.  

---

## **How to Use**  

### **Running with Transformers**  

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model & tokenizer
model_name = "saishshinde15/Clyrai_Vortex_Reasoning"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")

# Prepare input prompt
messages = [
    {"role": "system", "content": "You are an advanced AI assistant. Provide answers in a clear, step-by-step manner."},
    {"role": "user", "content": "If x + 3 = 10, what is x?"}
]

# Apply chat template and tokenize
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")

# Generate response
outputs = model.generate(input_ids, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)
```