Model Card for BioGenesis-ToT
Model Details
Model Description
BioGenesis-ToT is a fine-tuned version of Qwen3-1.7B, optimized for mechanistic reasoning and explanatory understanding in biology. This model has been trained on the moremilk/ToT-Biology dataset β a reasoning-rich collection of biology questions emphasizing why and how processes occur, rather than simply what happens.
The model demonstrates strong capabilities in:
- Structured biological explanation generation
- Logical and causal reasoning
- Chain-of-thought (ToT) reasoning in scientific contexts
- Interdisciplinary biological analysis (e.g., bioengineering, medicine, ecology)
Uses
π Intended Use
- Educational and scientific explanation generation
- Biological reasoning and tutoring applications
- Model interpretability research
- Training datasets for reasoning-focused LLMs
β οΈ Limitations
- Not a replacement for expert biological judgment
- May occasionally over-generalize or simplify complex phenomena
- Limited to reasoning quality within biological contexts (not trained for creative writing or coding)
Evaluation
Evaluation on emre/TARA_Turkish_LLM_Benchmark
Category | BioGenesis-ToT | Qwen3-1.7B |
---|---|---|
Scientific Explanation and Hypothesis Evaluation (RAG) | 66.36 | 61.82 |
Ethical Dilemma Assessment | 55.45 | 47.27 |
Complex Scenario Analysis and Drawing Conclusions | 61.82 | 59.09 |
Constrained Creative Writing | 18.18 | 9.09 |
Logical Inference (Text-Based) | 49.09 | 68.18 |
Mathematical Reasoning | 42.73 | 37.27 |
Planning and Optimization Problems (Text-Based) | 52.73 | 25.45 |
Python Code Analysis and Debugging | 51.82 | 50.00 |
Generating SQL Query (From Schema/Meta) | 39.09 | 36.36 |
Cause-Effect Relationship in Historical Events (RAG) | 77.27 | 73.64 |
Overall | 51.45 | 46.82 |
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B",)
base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/Qwen3-1.7B",
device_map={"": 0}
)
model = PeftModel.from_pretrained(base_model,"khazarai/BioGenesis-ToT")
question = """
Describe the composition of the plasma membrane and explain how its structure relates to its function of selective permeability.
"""
messages = [
{"role" : "user", "content" : question}
]
text = tokenizer.apply_chat_template(
messages,
tokenize = False,
add_generation_prompt = True,
enable_thinking = True,
)
from transformers import TextStreamer
_ = model.generate(
**tokenizer(text, return_tensors = "pt").to("cuda"),
max_new_tokens = 2200,
temperature = 0.6,
top_p = 0.95,
top_k = 20,
streamer = TextStreamer(tokenizer, skip_prompt = True),
)
For pipeline:
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B")
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-1.7B")
model = PeftModel.from_pretrained(base_model, "khazarai/BioGenesis-ToT")
question = """
Describe the composition of the plasma membrane and explain how its structure relates to its function of selective permeability.
"""
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
messages = [
{"role": "user", "content": question}
]
pipe(messages)
π§ͺ Dataset: moremilk/ToT-Biology
The ToT-Biology dataset emphasizes mechanistic understanding and explanatory reasoning within biology. Itβs designed to help AI models develop interpretable, step-by-step reasoning abilities for complex biological systems.
It spans a wide range of biological subdomains:
- Foundational biology: Cell biology, genetics, evolution, and ecology
- Advanced topics: Systems biology, synthetic biology, computational biophysics
- Applied domains: Medicine, agriculture, bioengineering, and environmental science
Dataset features include:
- π§© Logical reasoning styles β deductive, inductive, abductive, causal, and analogical
- π§ Problem-solving techniques β decomposition, elimination, systems thinking, trade-off analysis
- π¬ Real-world problem contexts β experiment design, pathway mapping, and data interpretation
- π Practical relevance β bridging theoretical reasoning and applied biological insight
- π Educational focus β for both AI training and human learning in scientific reasoning
π§ Objective
This fine-tuning project aims to build an interpretable reasoning model capable of:
- Explaining biological mechanisms clearly and coherently
- Demonstrating transparent, step-by-step thought processes
- Applying logical reasoning techniques to biological and interdisciplinary problems
- Supporting educational and research use cases where reasoning transparency matters
Citation
BibTeX:
@model{khazarai/BioGenesis-ToT,
title = {BioGenesis-ToT: A Fine-Tuned Model for Explanatory Biological Reasoning},
author = {Rustam Shiriyev},
year = {2025},
publisher = {Hugging Face},
base_model = {Qwen3-1.7B},
dataset = {moremilk/ToT-Biology},
license = {MIT}
}
Framework versions
- PEFT 0.15.2
- Downloads last month
- 133