Model Card for nexa-mistral-7b-psi

Model Details

Model Description:
nexa-mistral-7b-psi is a fine-tuned variant of the open-weight Mistral-7B-v0.1 model, optimized for scientific research generation tasks such as hypothesis generation, abstract writing, and methodology completion. Fine-tuning was performed using the PEFT (Parameter-Efficient Fine-Tuning) library with LoRA in 4-bit quantized mode using the bitsandbytes backend.

This model is part of the Nexa Scientific Intelligence (Psi) series, developed for scalable, automated scientific reasoning and domain-specific text generation.


Developed by: Allan (Independent Scientific Intelligence Architect)
Funded by: Self-funded
Shared by: Allan (https://huggingface.co/allan-wandeer)
Model type: Decoder-only transformer (causal language model)
Language(s): English (scientific domain-specific vocabulary)
License: Apache 2.0 (inherits from base model)
Fine-tuned from: mistralai/Mistral-7B-v0.1
Repository: https://huggingface.co/allan-wandeer/nexa-mistral-7b-psi
Demo: Coming soon via Hugging Face Spaces or Lambda inference endpoint.


Uses

Direct Use

  • Scientific hypothesis generation
  • Abstract and method section synthesis
  • Domain-specific research writing
  • Semantic completion of structured research prompts

Downstream Use

  • Fine-tuning or distillation into smaller expert models
  • Foundation for test-time reasoning agents
  • Seed model for bootstrapping larger synthetic scientific corpora

Out-of-Scope Use

  • General conversation or chat use cases
  • Non-English scientific domains
  • Legal, financial, or clinical advice generation

Bias, Risks, and Limitations

While the model performs well on structured scientific input, it inherits biases from its base model (Mistral-7B) and fine-tuning dataset. Results should be evaluated by domain experts before use in high-stakes settings. It may hallucinate plausible but incorrect facts, especially in low-data areas.


Recommendations

Users should:

  • Validate critical outputs against trusted scientific literature
  • Avoid deploying in clinical or regulatory environments without further evaluation
  • Consider additional domain fine-tuning for niche fields

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "allan-wandia/nexa-mistral-7b-sci"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")

prompt = "Generate a novel hypothesis in quantum materials research:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=250)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

  • Size: 100 million tokens sampled from a 500M+ token corpus
  • Source: Curated scientific literature, abstracts, methodologies, and domain-labeled corpora (Bio, Physics, QST, Astro)
  • Labeling: Token-level labels auto-generated via Nexa DataVault tokenizer infrastructure

Preprocessing

  • Tokenization with sequence truncation to 1024 tokens
  • Labeled and batched using CPU; inference dispatched to GPU asynchronously

Training Hyperparameters

  • Base model: mistralai/Mistral-7B-v0.1
  • Sequence length: 1024
  • Batch size: 1 (with gradient accumulation)
  • Gradient Accumulation Steps: 64
  • Effective Batch Size: 64
  • Learning rate: 2e-5
  • Epochs: 2
  • LoRA: Enabled (PEFT)
  • Quantization: 4-bit via bitsandbytes
  • Optimizer: 8-bit AdamW
  • Framework: Transformers + PEFT + Accelerate

Evaluation

Testing Data

  • Synthetic scientific prompts across domains (Physics, Biology, Materials Science)

Evaluation Factors

  • Semantic coherence (BLEU)
  • Hypothesis novelty (entropy score)
  • Internal scientific consistency (domain-specific rubric)

Metrics

Metric Score
BLEU (coherence) 10/10
Entropy novelty 6/10
Scientific consistency 9/10
Model similarity coef 87%

Results

Model performs robustly in hypothesis generation and scientific prose tasks. While base coherence is high, novelty depends on prompt diversity. Well-suited as a distiller or inference agent for synthetic scientific corpora generation.


Environmental Impact

Component Value
Hardware Type 2× NVIDIA T4 GPUs
Hours used ~7.5
Cloud Provider Kaggle (Google Cloud)
Compute Region US
Carbon Emitted Estimate pending (likely < 1kg CO2)

Technical Specifications

Model Architecture

  • Transformer decoder (Mistral-7B architecture)
  • LoRA adapters applied to attention and FFN layers
  • Quantized with bitsandbytes to 4-bit for memory efficiency

Compute Infrastructure

  • CPU: Intel i5 8th Gen vPro (batch preprocessing)
  • GPU: 2× NVIDIA T4 (CUDA 12.1)

Software Stack

  • PEFT 0.12.0
  • Transformers 4.41.1
  • Accelerate
  • TRL
  • Torch 2.x

Citation

BibTeX:

@misc{nexa-mistral-7b-sci,
  title = {Nexa Mistral 7B Sci},
  author = {Allan Wandia},
  year = {2025},
  howpublished = {\url{https://huggingface.co/allan-Wandia/nexa-mistral-7b-sci}},
  note = {Fine-tuned model for scientific generation tasks}
}

Model Card Contact

For questions, contact Allan via Hugging Face or at: 📫 Email: [[email protected]]


Model Card Authors

  • Allan Wandia (Independent ML Engineer and Systems Architect)

Glossary

  • LoRA: Low-Rank Adaptation
  • PEFT: Parameter-Efficient Fine-Tuning
  • BLEU: Bilingual Evaluation Understudy Score
  • Entropy Score: Metric used to estimate novelty/variation
  • Safe Tensors: Secure, fast format for model weights

Links

Github Repo and notebook: https://github.com/DarkStarStrix/Nexa_Auto

Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Allanatrix/Nexa-Mistral-Sci7b

Adapter
(2199)
this model

Dataset used to train Allanatrix/Nexa-Mistral-Sci7b

Collection including Allanatrix/Nexa-Mistral-Sci7b

Evaluation results