prithivMLmods
/

Cetus-Qwen3_4B-GeneralThought

 datasets:
 - GeneralReasoning/GeneralThought-430K
 base_model:
+- prithivMLmods/Qwen3-4B-ft-bf16
+language:
+- en
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- moe
+- text-generation-inference
+- code
+- math
+---
+# Cetus-Qwen3\_4B-GeneralThought
+> Cetus-Qwen3\_4B-GeneralThought is a fine-tuned variant of the Qwen3-4B architecture, trained on the GeneralThought-430K dataset to enhance broad-spectrum reasoning, logical coherence, and structured multi-domain problem solving. This model is optimized for general-purpose tasks including instruction following, technical question answering, and reasoning-based generation across diverse knowledge fields.
+## Key Features
+1. Broad Reasoning with GeneralThought-430K
+   Trained on a carefully curated 430,000-sample dataset—GeneralThought-430K—spanning:
+   * Mathematical and logical reasoning
+   * Scientific and factual QA
+   * Multistep instructions and problem decomposition
+   * Abstract and applied reasoning tasks
+2. Multi-Domain Task Versatility
+   Equipped to handle use cases across STEM, humanities, code reasoning, and general knowledge workflows with consistency and structure.
+3. Structured Output Control
+   Outputs well-formatted answers in Markdown, LaTeX, JSON, and tabular formats, suitable for documentation, education, and technical reporting.
+4. Instruction-Following with Multi-Step Fidelity
+   Capable of following detailed prompts involving layered reasoning or procedural guidance with high step-to-step coherence.
+5. Multilingual and Cross-Cultural Understanding
+   Supports over 20 languages for global comprehension tasks and technical translation in education and public sector applications.
+6. Efficient Qwen3-4B Base
+   Offers an optimal balance between intelligence and computational efficiency—ideal for deployment on consumer-grade GPUs and scalable services.
+## Quickstart with Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "prithivMLmods/Cetus-Qwen3_4B-GeneralThought"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "Explain the concept of entropy in thermodynamics in simple terms."
+messages = [
+    {"role": "system", "content": "You are a general-purpose reasoning assistant trained on GeneralThought-430K."},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
+```
+## Intended Use
+* General reasoning and educational Q\&A
+* Technical concept explanation and summarization
+* Structured content generation in Markdown, LaTeX, and JSON
+* Code and logic support in instruction-rich workflows
+* Multi-language academic and public knowledge tools
+## Limitations
+* Not optimized for purely creative or fictional content
+* Smaller context window compared to frontier models
+* May be sensitive to ambiguous or poorly structured prompts
+* Can occasionally hallucinate in niche or adversarial scenarios
+## References
+1. Qwen2.5 Technical Report – [https://arxiv.org/pdf/2412.15115](https://arxiv.org/pdf/2412.15115)
+2. YaRN: Context Window Extension – [https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)
+3. GeneralThought-430K Dataset – (internal/prepublication dataset source, if applicable)