Cetus-Qwen3_4B-GeneralThought / README.md

Update README.md

d22ef14 verified 4 months ago

4.02 kB

	---
	license: apache-2.0
	datasets:
	- GeneralReasoning/GeneralThought-430K
	base_model:
	- prithivMLmods/Qwen3-4B-ft-bf16
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- moe
	- text-generation-inference
	- code
	- math
	---

	![2.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/KsOstOMOTnO7oWVdPycA3.png)

	# Cetus-Qwen3\_4B-GeneralThought

	> Cetus-Qwen3\_4B-GeneralThought is a fine-tuned variant of the Qwen3-4B architecture, trained on the GeneralThought-430K dataset to enhance broad-spectrum reasoning, logical coherence, and structured multi-domain problem solving. This model is optimized for general-purpose tasks including instruction following, technical question answering, and reasoning-based generation across diverse knowledge fields.


	> [!note]
	[ GGUF ] : https://huggingface.co/prithivMLmods/Cetus-Qwen3_4B-GeneralThought-Q4_K_M-GGUF

	## Key Features

	1. Broad Reasoning with GeneralThought-430K
	Trained on a carefully curated 430,000-sample dataset—GeneralThought-430K—spanning:

	* Mathematical and logical reasoning
	* Scientific and factual QA
	* Multistep instructions and problem decomposition
	* Abstract and applied reasoning tasks

	2. Multi-Domain Task Versatility
	Equipped to handle use cases across STEM, humanities, code reasoning, and general knowledge workflows with consistency and structure.

	3. Structured Output Control
	Outputs well-formatted answers in Markdown, LaTeX, JSON, and tabular formats, suitable for documentation, education, and technical reporting.

	4. Instruction-Following with Multi-Step Fidelity
	Capable of following detailed prompts involving layered reasoning or procedural guidance with high step-to-step coherence.

	5. Multilingual and Cross-Cultural Understanding
	Supports over 20 languages for global comprehension tasks and technical translation in education and public sector applications.

	6. Efficient Qwen3-4B Base
	Offers an optimal balance between intelligence and computational efficiency—ideal for deployment on consumer-grade GPUs and scalable services.

	## Quickstart with Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "prithivMLmods/Cetus-Qwen3_4B-GeneralThought"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Explain the concept of entropy in thermodynamics in simple terms."

	messages = [
	{"role": "system", "content": "You are a general-purpose reasoning assistant trained on GeneralThought-430K."},
	{"role": "user", "content": prompt}
	]

	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```

	## Intended Use

	* General reasoning and educational Q\&A
	* Technical concept explanation and summarization
	* Structured content generation in Markdown, LaTeX, and JSON
	* Code and logic support in instruction-rich workflows
	* Multi-language academic and public knowledge tools

	## Limitations

	* Not optimized for purely creative or fictional content
	* Smaller context window compared to frontier models
	* May be sensitive to ambiguous or poorly structured prompts
	* Can occasionally hallucinate in niche or adversarial scenarios

	## References

	1. Qwen2.5 Technical Report – [https://arxiv.org/pdf/2412.15115](https://arxiv.org/pdf/2412.15115)
	2. YaRN: Context Window Extension – [https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)
	3. GeneralThought-430K Dataset – (internal/prepublication dataset source, if applicable)