File size: 4,705 Bytes
a0b36dc 97e0c62 9492a8c a0b36dc 97e0c62 a0b36dc 97e0c62 a0b36dc 5fa0c2d a0b36dc 5fa0c2d 97e0c62 9492a8c 97e0c62 a0b36dc 97e0c62 a0b36dc 9492a8c 97e0c62 5fa0c2d 97e0c62 5fa0c2d 97e0c62 755861c 97e0c62 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
---
base_model:
- saishshinde15/Clyrai_Base_Reasoning
tags:
- text-generation-inference
- transformers
- qwen2
- trl
- reasoning
- deepseekR1
- advanced-finetuning
license: apache-2.0
language:
- en
pipeline_tag: text-generation
---
# Clyrai Vortex Reasoning
- **Developed by:** clyrai
- **License:** apache-2.0
- **Fine-tuned from:** [saishshinde15/Clyrai_Base_Reasoning](https://huggingface.co/saishshinde15/TethysAI_Base_Reasoning)
- **Category:** Experimental, Research
## **Introduction**
TethysAI Vortex Reasoning is an **experimental model** that advances the structured reasoning capabilities pioneered by [Clyrai_Base Reasoning](https://huggingface.co/saishshinde15/TethysAI_Base_Reasoning). While the Base Reasoning model utilized **Generalized Reinforced Policy Optimization (GRPO)** to enhance step-by-step logical thought processes similar to **DeepSeek-R1**, this model takes a different approach—**eliminating GRPO and instead relying on high-end Supervised Fine-Tuning (SFT) techniques**.
The core objective was to investigate whether **deep reasoning and self-questioning behavior could emerge purely through SFT on high-quality datasets**. The results were highly promising: the model successfully **questions itself internally**, improves reasoning depth, and consistently generates structured, logical responses.
---
## **Key Features**
### **1️⃣ Advanced Reasoning Without GRPO**
This model **does not rely on GRPO** yet **achieves similar self-reflective thought processes**, proving that structured reasoning can be induced through **high-quality SFT alone**.
### **2️⃣ Self-Questioning and Iterative Thinking**
The model **actively asks itself intermediate questions before answering**, mimicking the deep **reflection-based thought process** of models like DeepSeek-R1. This leads to **more reliable** and **well-structured** responses.
### **3️⃣ High-Quality SFT on a Curated Dataset**
To compensate for the lack of reinforcement learning, we used an **extensive dataset** tailored for deep reasoning. This dataset includes:
- **Mathematical proofs & logical puzzles**
- **Complex multi-step problem-solving tasks**
- **Philosophical and ethical reasoning**
- **Scientific hypothesis evaluation**
### **4️⃣ Implicit Use of `<think>` and `<answer>` Tokens**
The model internally uses **special reasoning markers** (`<think>` and `<answer>`) to structure its responses, though these may not always be visible in the final output. This ensures a **consistent and methodical approach** to answering questions.
### **5️⃣ Part of the TethysAI Vortex Family**
This model belongs to the **Clyrai Vortex series**, a collection of fine-tuned models pushing the boundaries of **SFT-based reasoning without reinforcement learning**.
---
## **Breakthrough Insights**
| Feature | Base Reasoning (GRPO) ✅ | Vortex Reasoning (SFT-Only) ✅ |
|----------------------------------|------------------------|----------------------------|
| Structured Thought Process | ✅ Yes (GRPO) | ✅ Yes (SFT) |
| Self-Reflection & Questioning | ✅ Strong | ✅ Equally Strong |
| GRPO-Free Optimization | ❌ No | ✅ Achieved via SFT |
| Step-by-Step Problem Solving | ✅ Yes | ✅ Yes |
| Use of `<think>` and `<answer>` | ✅ Explicit | ✅ Implicit (Internal Use) |
**Key Takeaway:** This experiment confirms that **reinforcement learning is not the only pathway to advanced reasoning capabilities**—with the right dataset and SFT strategies, models can **self-reflect and logically deduce answers** in a structured manner.
---
## **How to Use**
### **Running with Transformers**
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model & tokenizer
model_name = "saishshinde15/Clyrai_Vortex_Reasoning"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
# Prepare input prompt
messages = [
{"role": "system", "content": "You are an advanced AI assistant. Provide answers in a clear, step-by-step manner."},
{"role": "user", "content": "If x + 3 = 10, what is x?"}
]
# Apply chat template and tokenize
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
# Generate response
outputs = model.generate(input_ids, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
``` |