Mini-Hydra
A specialized reasoning-focused MoE model based on Qwen3-30B-A3B
Model Details
Model Description
Mini-Hydra is a Mixture-of-Experts (MoE) language model designed for efficient reasoning and faster conclusion generation. Built upon the Qwen3-30B-A3B architecture, this model aims to bridge the performance gap between sparse MoE models and their dense counterparts while maintaining computational efficiency.
- Developed by: Daemontatox
- Model type: Mixture-of-Experts (MoE) Language Model
- Architecture: Qwen3-30B-A3B based
- Activated Parameters: 3 billion
- Total Parameters: ~30 billion (with MoE routing)
- Language(s): English (primary), with multilingual capabilities inherited from base model
- License: [Apache 2.0]
- Finetuned from model: Qwen3-30B-A3B
Model Sources
- Repository: https://huggingface.co/Daemontatox/Mini-Hydra
- Base Model: Qwen3-30B-A3B
- Training Datasets:
Uses
Direct Use
Mini-Hydra is designed for applications requiring:
- Efficient reasoning: Optimized for logical problem-solving with reduced computational overhead
- Mathematical reasoning: Enhanced performance on mathematical problems and proofs
- Conversational AI: Natural dialogue with reasoning capabilities
- Code generation: Programming assistance with logical reasoning steps
- Educational applications: Tutoring and explanation generation
Downstream Use
The model can be further fine-tuned for specific domains such as:
- Domain-specific reasoning (legal, medical, scientific)
- Specialized mathematical problem solving
- Custom conversational agents
- Educational content generation
Out-of-Scope Use
This model is not intended for:
- Production systems requiring 100% accuracy without human oversight
- Generating harmful, biased, or inappropriate content
- Real-time applications requiring sub-second response times
- Applications where model hallucination could cause harm
Bias, Risks, and Limitations
Known Limitations
Training Constraints: Due to resource limitations, the model received less training than originally planned, which may impact performance in some scenarios.
Reasoning Scope: While optimized for reasoning, the model may still struggle with very complex multi-step logical problems.
Language Bias: Primary training on English may lead to reduced performance in other languages.
Knowledge Cutoff: The model's knowledge is limited to the training data cutoff date.
Potential Risks
- Hallucination: Like all language models, Mini-Hydra may generate plausible-sounding but incorrect information
- Bias: May reflect biases present in training data
- Overconfidence: May present uncertain information with high confidence
Recommendations
- Always verify critical information from reliable sources
- Use appropriate safety measures and human oversight for important applications
- Consider the model's limitations when deploying in production environments
Training Details
Training Data
The model was trained on a carefully curated combination of reasoning-focused datasets:
- Tesslate/Gradient-Reasoning: Advanced reasoning problems with step-by-step solutions
- Daemontatox/curated_thoughts_convs: Curated conversational data emphasizing thoughtful responses
- Daemontatox/natural_reasoning: Natural language reasoning examples and explanations
- Daemontatox/numina_math_cconvs: Mathematical conversation and problem-solving data
Training Procedure
- Base Model: Qwen3-30B-A3B
- Training Objective: Optimized for efficient reasoning and faster conclusion generation
- Architecture: Mixture-of-Experts with 3B activated parameters
- Training Constraint: Limited by resource availability, resulting in abbreviated training cycle
Training Infrastructure
- Hardware: [2 A100 GPUs]
- Training Time: [72 hrs]
- Compute Resources: Resource-constrained environment
Evaluation
Testing Data, Factors & Metrics
The model's performance should be evaluated on:
- Reasoning Benchmarks: GSM8K, MATH, LogiQA
- General Language Tasks: MMLU, HellaSwag, ARC
- Efficiency Metrics: Inference speed, memory usage
- Reasoning Quality: Step-by-step problem solving accuracy
Results
[Note: Specific benchmark results would be added here once available]
The model demonstrates:
- Improved reasoning efficiency compared to dense models of similar size
- Competitive performance despite resource-constrained training
- Faster inference times due to MoE architecture
Technical Specifications
Model Architecture
- Base: Qwen3-30B-A3B MoE architecture
- Experts: Multiple expert networks with routing mechanism
- Activated Parameters: 3 billion per forward pass
- Total Parameters: ~30 billion
- Context Length: [Inherited from base model - likely 32K tokens]
- Vocabulary Size: [Inherited from base model]
Compute Infrastructure
- Training: Resource-constrained environment
- Inference: Optimized for efficiency with 3B activated parameters
- Memory Requirements: Significantly reduced compared to equivalent dense models
How to Use
Installation
pip install transformers torch accelerate
Basic Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "Daemontatox/Mini-Hydra"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# Example inference
def generate_response(prompt, max_length=512):
inputs = tokenizer.encode(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs,
max_length=max_length,
num_return_sequences=1,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response[len(prompt):].strip()
# Example usage
prompt = "Solve this step by step: If a train travels 120 miles in 2 hours, and then 180 miles in 3 hours, what is the average speed for the entire journey?"
response = generate_response(prompt)
print(response)
Advanced Usage with Custom Parameters
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch
model_name = "Daemontatox/Mini-Hydra"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# Custom generation configuration for reasoning tasks
generation_config = GenerationConfig(
temperature=0.1, # Lower temperature for more focused reasoning
top_p=0.9,
top_k=50,
repetition_penalty=1.1,
max_length=1024,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
def reasoning_generate(prompt, system_prompt="Think step by step and provide a clear reasoning process."):
full_prompt = f"{system_prompt}\n\nProblem: {prompt}\n\nSolution:"
inputs = tokenizer.encode(full_prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs,
generation_config=generation_config
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response[len(full_prompt):].strip()
# Example reasoning problem
math_problem = """
A rectangular garden has a length that is 3 times its width.
If the perimeter is 32 meters, what are the dimensions of the garden?
"""
solution = reasoning_generate(math_problem)
print(solution)
Batch Processing
def batch_reasoning(prompts, batch_size=4):
results = []
for i in range(0, len(prompts), batch_size):
batch_prompts = prompts[i:i+batch_size]
batch_inputs = tokenizer(
batch_prompts,
return_tensors="pt",
padding=True,
truncation=True
)
with torch.no_grad():
batch_outputs = model.generate(
**batch_inputs,
max_length=512,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
batch_responses = tokenizer.batch_decode(batch_outputs, skip_special_tokens=True)
results.extend(batch_responses)
return results
# Example batch processing
problems = [
"What is 15% of 240?",
"If x + 5 = 12, what is x?",
"A circle has radius 7. What is its area?",
"Solve: 2x + 3 = 15"
]
solutions = batch_reasoning(problems)
for problem, solution in zip(problems, solutions):
print(f"Problem: {problem}")
print(f"Solution: {solution}")
print("-" * 50)
Example Outputs
Mathematical Reasoning
Input: "Solve this step by step: If a train travels 120 miles in 2 hours, and then 180 miles in 3 hours, what is the average speed for the entire journey?"
Output: "Let me solve this step by step:
Step 1: Calculate total distance
Total distance = 120 miles + 180 miles = 300 miles
Step 2: Calculate total time
Total time = 2 hours + 3 hours = 5 hours
Step 3: Calculate average speed
Average speed = Total distance ÷ Total time
Average speed = 300 miles ÷ 5 hours = 60 miles per hour
Therefore, the average speed for the entire journey is 60 mph."
Logical Reasoning
Input: "If all roses are flowers, and some flowers are red, can we conclude that some roses are red?"
Output: "Let me analyze this logical statement:
Given premises:
1. All roses are flowers
2. Some flowers are red
The question asks if we can conclude that some roses are red.
Analysis:
- From premise 1: roses ⊆ flowers (roses are a subset of flowers)
- From premise 2: ∃ flowers that are red
However, we cannot definitively conclude that some roses are red because:
- The red flowers mentioned in premise 2 might be flowers other than roses
- We have no information directly connecting roses to the property of being red
Conclusion: No, we cannot logically conclude that some roses are red based solely on the given premises."
Code Generation with Reasoning
Input: "Write a Python function to find the factorial of a number, and explain your approach."
Output: "Here's a Python function to calculate factorial with explanation:
def factorial(n):
"""
Calculate the factorial of a non-negative integer n.
Args:
n (int): Non-negative integer
Returns:
int: n! (factorial of n)
Raises:
ValueError: If n is negative
"""
if n < 0:
raise ValueError("Factorial is not defined for negative numbers")
elif n == 0 or n == 1:
return 1
else:
result = 1
for i in range(2, n + 1):
result *= i
return result
Reasoning behind the approach:
- Input validation: Check for negative numbers since factorial is undefined for them
- Base cases: 0! = 1 and 1! = 1 by mathematical definition
- Iterative calculation: For n > 1, multiply all integers from 2 to n
- This iterative approach is more memory-efficient than recursion for large numbers
Example usage:
print(factorial(5)) # Output: 120
print(factorial(0)) # Output: 1
Model Card Authors
Primary Author: Daemontatox
Model Card Contact
For questions, issues, or collaboration opportunities, please contact through the Hugging Face model repository.
Citation
@misc{mini-hydra-2024,
title={Mini-Hydra: Efficient Reasoning with Mixture-of-Experts},
author={Daemontatox},
year={2024},
publisher={Hugging Face},
howpublished={\\url{https://huggingface.co/Daemontatox/Mini-Hydra}},
note={Based on Qwen3-30B-A3B architecture}
}
This model card follows the guidelines established by the Hugging Face Model Card framework and includes technical details, usage examples, and important limitations to ensure responsible use of the model.
- Downloads last month
- 32