loaiabdalslam commited on
Commit
c8159f2
·
verified ·
1 Parent(s): fae2932

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +240 -9
README.md CHANGED
@@ -1,22 +1,253 @@
1
  ---
2
- base_model: unsloth/qwen3-14b-unsloth-bnb-4bit
 
3
  tags:
4
- - text-generation-inference
5
- - transformers
6
  - unsloth
7
  - qwen3
8
- - trl
9
- license: apache-2.0
10
- language:
11
- - en
 
 
 
 
 
 
 
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  # Uploaded model
15
 
16
- - **Developed by:** beetlware
17
  - **License:** apache-2.0
18
  - **Finetuned from model :** unsloth/qwen3-14b-unsloth-bnb-4bit
19
 
20
  This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
2
+ language: ar
3
+ license: apache-2.0
4
  tags:
 
 
5
  - unsloth
6
  - qwen3
7
+ - qwen2
8
+ - 14b
9
+ - arabic
10
+ - logical-reasoning
11
+ - conversational
12
+ - instruction-following
13
+ - text-generation
14
+ - merged_16bit
15
+ base_model: unsloth/Qwen3-14B
16
+ datasets:
17
+ - beetlware/arabic-reasoning-dataset-logic
18
  ---
19
 
20
+ # Bee1reason-arabic-Qwen-14B: A Qwen3 14B Model Fine-tuned for Arabic Logical Reasoning
21
+
22
+ ## Model Overview
23
+
24
+ **Bee1reason-arabic-Qwen-14B** is a Large Language Model (LLM) fine-tuned from the `unsloth/Qwen3-14B` base model (which itself is based on `Qwen/Qwen2-14B`). This model has been specifically tailored to enhance logical and deductive reasoning capabilities in the Arabic language, while also maintaining its general conversational abilities. The fine-tuning process utilized LoRA (Low-Rank Adaptation) with the [Unsloth](https://github.com/unslothai/unsloth) library for high training efficiency. The LoRA weights were then merged with the base model to produce this standalone 16-bit (float16) precision model.
25
+
26
+ **Key Features:**
27
+ * **Built on `unsloth/Qwen3-14B`:** Leverages the power and performance of the Qwen3 14-billion parameter base model.
28
+ * **Fine-tuned for Arabic Logical Reasoning:** Trained on a dataset containing Arabic logical reasoning tasks.
29
+ * **Conversational Format:** The model follows a conversational format, expecting user and assistant roles. It was trained on data that may include "thinking steps" (often within `<think>...</think>` tags) before providing the final answer, which is beneficial for tasks requiring explanation or complex inference.
30
+ * **Unsloth Efficiency:** The Unsloth library was used for the fine-tuning process, enabling faster training and reduced GPU memory consumption.
31
+ * **Merged 16-bit Model:** The final weights are a full float16 precision model, ready for direct use without needing to apply LoRA adapters to a separate base model.
32
+
33
+ ## Training Data
34
+
35
+ The model was primarily fine-tuned on a custom Arabic logical reasoning dataset, `beetlware/arabic-reasoning-dataset-logic`, available on the Hugging Face Hub. This dataset includes tasks variés types of reasoning (deduction, induction, abduction), with each task comprising the question text, a proposed answer, and a detailed solution including thinking steps.
36
+
37
+ This data was converted into a conversational format for training, typically with:
38
+ 1. **User Role:** Containing the problem/question text.
39
+ 2. **Assistant Role:** Containing the detailed solution, including thinking steps (often within `<think>...</think>` tags) followed by the final answer.
40
+
41
+ ## Fine-tuning Details
42
+
43
+ * **Base Model:** `unsloth/Qwen3-14B`
44
+ * **Fine-tuning Technique:** LoRA (Low-Rank Adaptation)
45
+ * `r` (rank): 32
46
+ * `lora_alpha`: 32
47
+ * `target_modules`: `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]`
48
+ * `lora_dropout`: 0
49
+ * `bias`: "none"
50
+ * **Libraries Used:** Unsloth (for efficient model loading and PEFT application) and Hugging Face TRL (`SFTTrainer`)
51
+ * **Max Sequence Length (`max_seq_length`):** 2048 tokens
52
+ * **Training Parameters (example from notebook):**
53
+ * `per_device_train_batch_size`: 2
54
+ * `gradient_accumulation_steps`: 4 (simulating a total batch size of 8)
55
+ * `warmup_steps`: 5
56
+ * `max_steps`: 30 (in the notebook, adjustable for a full run)
57
+ * `learning_rate`: 2e-4 (recommended to reduce to 2e-5 for longer training runs)
58
+ * `optim`: "adamw_8bit"
59
+ * **Final Save:** LoRA weights were merged with the base model and saved in `merged_16bit` (float16) precision.
60
+
61
+ ## How to Use (with Transformers)
62
+
63
+ Since this is a merged 16-bit model, you can load and use it directly with the `transformers` library:
64
+
65
+ ```python
66
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
67
+ import torch
68
+
69
+ model_id = "beetlware/Bee1reason-arabic-Qwen-14B"
70
+
71
+ # Load the Tokenizer
72
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
73
+
74
+ # Load the Model
75
+ model = AutoModelForCausalLM.from_pretrained(
76
+ model_id,
77
+ torch_dtype=torch.bfloat16, # or torch.float16 if bfloat16 is not supported
78
+ device_map="auto", # Distributes the model on available devices (GPU/CPU)
79
+ )
80
+
81
+ # Ensure the model is in evaluation mode for inference
82
+ model.eval()
83
+ ```
84
+ # --- Example for Inference with Thinking Steps (if the model was trained to produce them) ---
85
+ # Qwen3 models expect special tags for thinking <think>...</think>
86
+ # To enable thinking mode during inference (if supported by the fine-tuned model):
87
+ # You might need to craft the prompt to ask the model to think.
88
+ # Unsloth-trained Qwen3 models often respond to enable_thinking in tokenizer.apply_chat_template.
89
+ # For a merged model, its ability to show <think> depends on the training data.
90
+ ```python
91
+ user_prompt_with_thinking_request = "استخدم التفكير المنطقي خطوة بخطوة: إذا كان لدي 4 تفاحات والشجرة فيها 20 تفاحة، فكم تفاح�� لدي إجمالاً؟" # "Use step-by-step logical thinking: If I have 4 apples and the tree has 20 apples, how many apples do I have in total?"
92
+
93
+ messages_with_thinking = [
94
+ {"role": "user", "content": user_prompt_with_thinking_request}
95
+ ]
96
+
97
+ # Apply the chat template
98
+ # Qwen3 uses a specific chat template. tokenizer.apply_chat_template is the correct way to format it.
99
+ chat_prompt_with_thinking = tokenizer.apply_chat_template(
100
+ messages_with_thinking,
101
+ tokenize=False,
102
+ add_generation_prompt=True # Important for adding the assistant's generation prompt
103
+ )
104
+
105
+ inputs_with_thinking = tokenizer(chat_prompt_with_thinking, return_tensors="pt").to(model.device)
106
+
107
+ print("\n--- Inference with Thinking Request (Example) ---")
108
+ streamer_think = TextStreamer(tokenizer, skip_prompt=True)
109
+ with torch.no_grad(): # Important to disable gradients during inference
110
+ outputs_think = model.generate(
111
+ **inputs_with_thinking,
112
+ max_new_tokens=512,
113
+ temperature=0.6, # Recommended settings for reasoning by Qwen team
114
+ top_p=0.95,
115
+ top_k=20,
116
+ pad_token_id=tokenizer.eos_token_id,
117
+ streamer=streamer_think
118
+ )
119
+ ```
120
+
121
+ ```python
122
+ # --- Example for Normal Inference (Conversation without explicit thinking request) ---
123
+ user_prompt_normal = "ما هي عاصمة مصر؟" # "What is the capital of Egypt?"
124
+ messages_normal = [
125
+ {"role": "user", "content": user_prompt_normal}
126
+ ]
127
+
128
+ chat_prompt_normal = tokenizer.apply_chat_template(
129
+ messages_normal,
130
+ tokenize=False,
131
+ add_generation_prompt=True
132
+ )
133
+ inputs_normal = tokenizer(chat_prompt_normal, return_tensors="pt").to(model.device)
134
+
135
+ print("\n\n--- Normal Inference (Example) ---")
136
+ streamer_normal = TextStreamer(tokenizer, skip_prompt=True)
137
+ with torch.no_grad():
138
+ outputs_normal = model.generate(
139
+ **inputs_normal,
140
+ max_new_tokens=100,
141
+ temperature=0.7, # Recommended settings for normal chat
142
+ top_p=0.8,
143
+ top_k=20,
144
+ pad_token_id=tokenizer.eos_token_id,
145
+ streamer=streamer_normal
146
+ )
147
+ ```
148
+
149
+
150
+ ## Usage with VLLM (for High-Throughput Scaled Inference)
151
+ VLLM is a library for fast LLM inference. Since you saved the model as merged_16bit, it can be used with VLLM.
152
+
153
+ 1. Install VLLM:
154
+
155
+ ```bash
156
+
157
+ pip install vllm
158
+ ```
159
+ (VLLM installation might have specific CUDA and PyTorch version requirements. Refer to the VLLM documentation for the latest installation prerequisites.)
160
+
161
+ 2. Run the VLLM OpenAI-Compatible Server:
162
+ You can serve the model using VLLM's OpenAI-compatible API server, making it easy to integrate into existing applications.
163
+
164
+ ```bash
165
+ python -m vllm.entrypoints.openai.api_server \
166
+ --model beetlware/Bee1reason-arabic-Qwen-14B \
167
+ --tokenizer beetlware/Bee1reason-arabic-Qwen-14B \
168
+ --dtype bfloat16 \
169
+ --max-model-len 2048 \
170
+ # --tensor-parallel-size N # If you have multiple GPUs
171
+ # --gpu-memory-utilization 0.9 # To adjust GPU memory usage
172
+
173
+ ```
174
+ - Replace --dtype bfloat16 with float16 if needed.
175
+ - max-model-len should match the max_seq_length you used.
176
+
177
+ 3. Send Requests to the VLLM Server:
178
+ Once the server is running (typically on http://localhost:8000), you can send requests using any OpenAI-compatible client, like the openai library:
179
+ ```python
180
+
181
+ import openai
182
+
183
+ client = openai.OpenAI(
184
+ base_url="http://localhost:8000/v1", # VLLM server address
185
+ api_key="dummy_key" # VLLM doesn't require an actual API key by default
186
+ )
187
+
188
+ completion = client.chat.completions.create(
189
+ model="beetlware/Bee1reason-arabic-Qwen-14B", # Model name as specified in VLLM
190
+ messages=[
191
+ {"role": "user", "content": "اشرح نظرية النسبية العامة بكلمات بسيطة."} # "Explain the theory of general relativity in simple terms."
192
+ ],
193
+ max_tokens=256,
194
+ temperature=0.7,
195
+ stream=True # To enable streaming
196
+ )
197
+
198
+ print("Streaming response from VLLM:")
199
+ full_response = ""
200
+ for chunk in completion:
201
+ if chunk.choices[0].delta.content is not None:
202
+ token = chunk.choices[0].delta.content
203
+ print(token, end="", flush=True)
204
+ full_response += token
205
+ print("\n--- End of stream ---")
206
+
207
+ ```
208
+
209
+
210
+ # Limitations and Potential Biases
211
+ The model's performance is highly dependent on the quality and diversity of the training data. It may exhibit biases present in the data it was trained on.
212
+ Despite fine-tuning for logical reasoning, the model might still make errors on very complex or unfamiliar reasoning tasks.
213
+ The model may "hallucinate" or produce incorrect information, especially for topics not well-covered in its training data.
214
+ Capabilities in languages other than Arabic (if primarily trained on Arabic) might be limited.
215
+
216
+
217
+ # Additional Information
218
+ Developed by: [loai abdalslam/Organization - beetleware]
219
+ Upload/Release Date: [21-5-2025]
220
+ Contact / Issue Reporting: [[email protected]]
221
+
222
+ # Beetleware :
223
+
224
+
225
+ We are a software house and digital transformation service provider that was founded six years ago and is based in Saudi Arabia.
226
+
227
+ All rights reserved@2025
228
+
229
+ Our Offices
230
+
231
+ KSA Office
232
+ (+966) 54 597 3282
233
234
+
235
+ Egypt Office
236
+ (+2) 010 67 256 306
237
+
238
239
+ Oman Office
240
+ (+968) 9522 8632
241
+
242
+
243
+
244
+
245
  # Uploaded model
246
 
247
+ - **Developed by:** beetlware AI Team
248
  - **License:** apache-2.0
249
  - **Finetuned from model :** unsloth/qwen3-14b-unsloth-bnb-4bit
250
 
251
  This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
252
 
253
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)