Model Card for BioGenesis-ToT

Model Details

Model Description

BioGenesis-ToT is a fine-tuned version of Qwen3-1.7B, optimized for mechanistic reasoning and explanatory understanding in biology. This model has been trained on the moremilk/ToT-Biology dataset — a reasoning-rich collection of biology questions emphasizing why and how processes occur, rather than simply what happens.

The model demonstrates strong capabilities in:

Structured biological explanation generation
Logical and causal reasoning
Chain-of-thought (ToT) reasoning in scientific contexts
Interdisciplinary biological analysis (e.g., bioengineering, medicine, ecology)

Uses

🚀 Intended Use

Educational and scientific explanation generation
Biological reasoning and tutoring applications
Model interpretability research
Training datasets for reasoning-focused LLMs

⚠️ Limitations

Not a replacement for expert biological judgment
May occasionally over-generalize or simplify complex phenomena
Limited to reasoning quality within biological contexts (not trained for creative writing or coding)

Evaluation

Evaluation on emre/TARA_Turkish_LLM_Benchmark

Category	BioGenesis-ToT	Qwen3-1.7B
Scientific Explanation and Hypothesis Evaluation (RAG)	66.36	61.82
Ethical Dilemma Assessment	55.45	47.27
Complex Scenario Analysis and Drawing Conclusions	61.82	59.09
Constrained Creative Writing	18.18	9.09
Logical Inference (Text-Based)	49.09	68.18
Mathematical Reasoning	42.73	37.27
Planning and Optimization Problems (Text-Based)	52.73	25.45
Python Code Analysis and Debugging	51.82	50.00
Generating SQL Query (From Schema/Meta)	39.09	36.36
Cause-Effect Relationship in Historical Events (RAG)	77.27	73.64
Overall	51.45	46.82

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel


tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B",)
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/Qwen3-1.7B",
    device_map={"": 0}
)

model = PeftModel.from_pretrained(base_model,"khazarai/BioGenesis-ToT")

question = """
Describe the composition of the plasma membrane and explain how its structure relates to its function of selective permeability.
"""

messages = [
    {"role" : "user", "content" : question}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,
    enable_thinking = True,
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 2200,
    temperature = 0.6,
    top_p = 0.95,
    top_k = 20,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

For pipeline:

from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B")
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-1.7B")
model = PeftModel.from_pretrained(base_model, "khazarai/BioGenesis-ToT")

question = """
Describe the composition of the plasma membrane and explain how its structure relates to its function of selective permeability.
"""

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
messages = [
    {"role": "user", "content": question}
]
pipe(messages)

🧪 Dataset: moremilk/ToT-Biology

The ToT-Biology dataset emphasizes mechanistic understanding and explanatory reasoning within biology. It’s designed to help AI models develop interpretable, step-by-step reasoning abilities for complex biological systems.

It spans a wide range of biological subdomains:

Foundational biology: Cell biology, genetics, evolution, and ecology
Advanced topics: Systems biology, synthetic biology, computational biophysics
Applied domains: Medicine, agriculture, bioengineering, and environmental science

Dataset features include:

🧩 Logical reasoning styles — deductive, inductive, abductive, causal, and analogical
🧠 Problem-solving techniques — decomposition, elimination, systems thinking, trade-off analysis
🔬 Real-world problem contexts — experiment design, pathway mapping, and data interpretation
🌍 Practical relevance — bridging theoretical reasoning and applied biological insight
🎓 Educational focus — for both AI training and human learning in scientific reasoning

🧭 Objective

This fine-tuning project aims to build an interpretable reasoning model capable of:

Explaining biological mechanisms clearly and coherently
Demonstrating transparent, step-by-step thought processes
Applying logical reasoning techniques to biological and interdisciplinary problems
Supporting educational and research use cases where reasoning transparency matters

Citation

BibTeX:

@model{khazarai/BioGenesis-ToT,
  title     = {BioGenesis-ToT: A Fine-Tuned Model for Explanatory Biological Reasoning},
  author    = {Rustam Shiriyev},
  year      = {2025},
  publisher = {Hugging Face},
  base_model = {Qwen3-1.7B},
  dataset   = {moremilk/ToT-Biology},
  license   = {MIT}
}