prithivMLmods commited on
Commit
8bfa657
·
verified ·
1 Parent(s): 5cf85fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -2
README.md CHANGED
@@ -3,5 +3,104 @@ license: apache-2.0
3
  datasets:
4
  - GeneralReasoning/GeneralThought-430K
5
  base_model:
6
- - Qwen/Qwen3-1.7B
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  datasets:
4
  - GeneralReasoning/GeneralThought-430K
5
  base_model:
6
+ - prithivMLmods/Qwen3-4B-ft-bf16
7
+ language:
8
+ - en
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ tags:
12
+ - moe
13
+ - text-generation-inference
14
+ - code
15
+ - math
16
+ ---
17
+ # Cetus-Qwen3\_4B-GeneralThought
18
+
19
+ > Cetus-Qwen3\_4B-GeneralThought is a fine-tuned variant of the Qwen3-4B architecture, trained on the GeneralThought-430K dataset to enhance broad-spectrum reasoning, logical coherence, and structured multi-domain problem solving. This model is optimized for general-purpose tasks including instruction following, technical question answering, and reasoning-based generation across diverse knowledge fields.
20
+
21
+ ## Key Features
22
+
23
+ 1. Broad Reasoning with GeneralThought-430K
24
+ Trained on a carefully curated 430,000-sample dataset—GeneralThought-430K—spanning:
25
+
26
+ * Mathematical and logical reasoning
27
+ * Scientific and factual QA
28
+ * Multistep instructions and problem decomposition
29
+ * Abstract and applied reasoning tasks
30
+
31
+ 2. Multi-Domain Task Versatility
32
+ Equipped to handle use cases across STEM, humanities, code reasoning, and general knowledge workflows with consistency and structure.
33
+
34
+ 3. Structured Output Control
35
+ Outputs well-formatted answers in Markdown, LaTeX, JSON, and tabular formats, suitable for documentation, education, and technical reporting.
36
+
37
+ 4. Instruction-Following with Multi-Step Fidelity
38
+ Capable of following detailed prompts involving layered reasoning or procedural guidance with high step-to-step coherence.
39
+
40
+ 5. Multilingual and Cross-Cultural Understanding
41
+ Supports over 20 languages for global comprehension tasks and technical translation in education and public sector applications.
42
+
43
+ 6. Efficient Qwen3-4B Base
44
+ Offers an optimal balance between intelligence and computational efficiency—ideal for deployment on consumer-grade GPUs and scalable services.
45
+
46
+ ## Quickstart with Transformers
47
+
48
+ ```python
49
+ from transformers import AutoModelForCausalLM, AutoTokenizer
50
+
51
+ model_name = "prithivMLmods/Cetus-Qwen3_4B-GeneralThought"
52
+
53
+ model = AutoModelForCausalLM.from_pretrained(
54
+ model_name,
55
+ torch_dtype="auto",
56
+ device_map="auto"
57
+ )
58
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
59
+
60
+ prompt = "Explain the concept of entropy in thermodynamics in simple terms."
61
+
62
+ messages = [
63
+ {"role": "system", "content": "You are a general-purpose reasoning assistant trained on GeneralThought-430K."},
64
+ {"role": "user", "content": prompt}
65
+ ]
66
+
67
+ text = tokenizer.apply_chat_template(
68
+ messages,
69
+ tokenize=False,
70
+ add_generation_prompt=True
71
+ )
72
+
73
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
74
+
75
+ generated_ids = model.generate(
76
+ **model_inputs,
77
+ max_new_tokens=512
78
+ )
79
+ generated_ids = [
80
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
81
+ ]
82
+
83
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
84
+ print(response)
85
+ ```
86
+
87
+ ## Intended Use
88
+
89
+ * General reasoning and educational Q\&A
90
+ * Technical concept explanation and summarization
91
+ * Structured content generation in Markdown, LaTeX, and JSON
92
+ * Code and logic support in instruction-rich workflows
93
+ * Multi-language academic and public knowledge tools
94
+
95
+ ## Limitations
96
+
97
+ * Not optimized for purely creative or fictional content
98
+ * Smaller context window compared to frontier models
99
+ * May be sensitive to ambiguous or poorly structured prompts
100
+ * Can occasionally hallucinate in niche or adversarial scenarios
101
+
102
+ ## References
103
+
104
+ 1. Qwen2.5 Technical Report – [https://arxiv.org/pdf/2412.15115](https://arxiv.org/pdf/2412.15115)
105
+ 2. YaRN: Context Window Extension – [https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)
106
+ 3. GeneralThought-430K Dataset – (internal/prepublication dataset source, if applicable)