Instructions to use crumb/nano-mistral with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use crumb/nano-mistral with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="crumb/nano-mistral")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("crumb/nano-mistral") model = AutoModelForCausalLM.from_pretrained("crumb/nano-mistral") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use crumb/nano-mistral with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "crumb/nano-mistral" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "crumb/nano-mistral", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/crumb/nano-mistral
- SGLang
How to use crumb/nano-mistral with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "crumb/nano-mistral" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "crumb/nano-mistral", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "crumb/nano-mistral" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "crumb/nano-mistral", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use crumb/nano-mistral with Docker Model Runner:
docker model run hf.co/crumb/nano-mistral
- Model Card for Model ID
- Model Details
- Uses
- Bias, Risks, and Limitations
- How to Get Started with the Model
- Training Details
- Evaluation
- Model Examination [optional]
- Environmental Impact
- Technical Specifications [optional]
- Citation [optional]
- Glossary [optional]
- More Information [optional]
- Model Card Authors [optional]
- Model Card Contact
Model Card for Model ID
Model Details
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: me
- Model type: Mistral
- Language(s) (NLP): en
- License: apache
Uses
general web text completions at extremely low resource use
Out-of-Scope Use
not an instruct model
Bias, Risks, and Limitations
trained on web text, though filtered no guarantees theres not toxic stuff in there
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("crumb/nano-mistral")
tokenizer = AutoTokenizer.from_pretrained("crumb/nano-mistral")
inputs = tokenizer(["Once upon a time,"], return_tensors="pt")
inputs = {k:v.to(model.device) for k,v in dict(inputs).items()}
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.7, top_k=20, do_sample=True)
outputs = tokenizer.batch_decode(outputs)
for i in outputs:
print(i)
Training Details
Training Data
Training Procedure
| Parameter | Value |
|---|---|
| Context Length | 2048 |
| Batch Size | 128 |
| Learning Rate | 6e-4 |
| Scheduler | One-Cycle |
| Adam eps | 1e-8 |
| Adam beta1 | 0.9 |
| Adam beta2 | 0.95 |
| Weight Decay | 0.1 |
| Max Grad Norm | 1.0 |
| Optimizer | adamw_torch |
| Tokens | 3,401,640,960 |
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: bf16 non-mixed precision
Speeds, Sizes, Times [optional]
train_runtime 62541.9424
train_samples_per_second 26.557
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
held out set of crumb/askmistral-pile-2-15
Factors
[More Information Needed]
Metrics
open llm leaderboard eval datasets and settings
Results
OpenLLM Leaderboard Mean Score + Stderr: (29.30, 0.42)
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 25 | acc | 0.1843 | ± | 0.0113 |
| none | 25 | acc_norm | 0.2167 | ± | 0.0120 | ||
| truthfulqa_mc2 | 2 | none | 0 | acc | 0.4719 | ± | 0.0156 |
| winogrande | 1 | none | 5 | acc | 0.517 | ± | 0.014 |
| hellaswag | 1 | none | 10 | acc | 0.2803 | ± | 0.0045 |
| none | 10 | acc_norm | 0.2886 | ± | 0.0045 | ||
| gsm8k | 3 | strict-match | 5 | exact_match | 0.0008 | ± | 0.0008 |
| flexible-extract | 5 | exact_match | 0.0099 | ± | 0.0027 |
MMLU
value, stderr = (0.253980701754386, 0.004428598058450528)
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| world_religions | 0 | none | 5 | acc | 0.2222 | ± | 0.0319 |
| virology | 0 | none | 5 | acc | 0.2711 | ± | 0.0346 |
| us_foreign_policy | 0 | none | 5 | acc | 0.3300 | ± | 0.0473 |
| sociology | 0 | none | 5 | acc | 0.2388 | ± | 0.0301 |
| security_studies | 0 | none | 5 | acc | 0.2367 | ± | 0.0272 |
| public_relations | 0 | none | 5 | acc | 0.2273 | ± | 0.0401 |
| professional_psychology | 0 | none | 5 | acc | 0.2484 | ± | 0.0175 |
| professional_medicine | 0 | none | 5 | acc | 0.4596 | ± | 0.0303 |
| professional_law | 0 | none | 5 | acc | 0.2464 | ± | 0.0110 |
| professional_accounting | 0 | none | 5 | acc | 0.2021 | ± | 0.0240 |
| prehistory | 0 | none | 5 | acc | 0.2130 | ± | 0.0228 |
| philosophy | 0 | none | 5 | acc | 0.2219 | ± | 0.0236 |
| nutrition | 0 | none | 5 | acc | 0.2157 | ± | 0.0236 |
| moral_scenarios | 0 | none | 5 | acc | 0.2380 | ± | 0.0142 |
| moral_disputes | 0 | none | 5 | acc | 0.2486 | ± | 0.0233 |
| miscellaneous | 0 | none | 5 | acc | 0.2516 | ± | 0.0155 |
| medical_genetics | 0 | none | 5 | acc | 0.3000 | ± | 0.0461 |
| marketing | 0 | none | 5 | acc | 0.2265 | ± | 0.0274 |
| management | 0 | none | 5 | acc | 0.1748 | ± | 0.0376 |
| machine_learning | 0 | none | 5 | acc | 0.3125 | ± | 0.0440 |
| logical_fallacies | 0 | none | 5 | acc | 0.2393 | ± | 0.0335 |
| jurisprudence | 0 | none | 5 | acc | 0.2315 | ± | 0.0408 |
| international_law | 0 | none | 5 | acc | 0.3140 | ± | 0.0424 |
| human_sexuality | 0 | none | 5 | acc | 0.2519 | ± | 0.0381 |
| human_aging | 0 | none | 5 | acc | 0.3049 | ± | 0.0309 |
| high_school_world_history | 0 | none | 5 | acc | 0.2658 | ± | 0.0288 |
| high_school_us_history | 0 | none | 5 | acc | 0.2451 | ± | 0.0302 |
| high_school_statistics | 0 | none | 5 | acc | 0.4722 | ± | 0.0340 |
| high_school_psychology | 0 | none | 5 | acc | 0.1963 | ± | 0.0170 |
| high_school_physics | 0 | none | 5 | acc | 0.3046 | ± | 0.0376 |
| high_school_microeconomics | 0 | none | 5 | acc | 0.2773 | ± | 0.0291 |
| high_school_mathematics | 0 | none | 5 | acc | 0.2667 | ± | 0.0270 |
| high_school_macroeconomics | 0 | none | 5 | acc | 0.2667 | ± | 0.0224 |
| high_school_government_and_politics | 0 | none | 5 | acc | 0.2591 | ± | 0.0316 |
| high_school_geography | 0 | none | 5 | acc | 0.2424 | ± | 0.0305 |
| high_school_european_history | 0 | none | 5 | acc | 0.2242 | ± | 0.0326 |
| high_school_computer_science | 0 | none | 5 | acc | 0.2800 | ± | 0.0451 |
| high_school_chemistry | 0 | none | 5 | acc | 0.2857 | ± | 0.0318 |
| high_school_biology | 0 | none | 5 | acc | 0.3129 | ± | 0.0264 |
| global_facts | 0 | none | 5 | acc | 0.1500 | ± | 0.0359 |
| formal_logic | 0 | none | 5 | acc | 0.1905 | ± | 0.0351 |
| elementary_mathematics | 0 | none | 5 | acc | 0.2513 | ± | 0.0223 |
| electrical_engineering | 0 | none | 5 | acc | 0.2759 | ± | 0.0372 |
| econometrics | 0 | none | 5 | acc | 0.2456 | ± | 0.0405 |
| conceptual_physics | 0 | none | 5 | acc | 0.2638 | ± | 0.0288 |
| computer_security | 0 | none | 5 | acc | 0.1800 | ± | 0.0386 |
| college_physics | 0 | none | 5 | acc | 0.2549 | ± | 0.0434 |
| college_medicine | 0 | none | 5 | acc | 0.2023 | ± | 0.0306 |
| college_mathematics | 0 | none | 5 | acc | 0.2900 | ± | 0.0456 |
| college_computer_science | 0 | none | 5 | acc | 0.2700 | ± | 0.0446 |
| college_chemistry | 0 | none | 5 | acc | 0.2500 | ± | 0.0435 |
| college_biology | 0 | none | 5 | acc | 0.2222 | ± | 0.0348 |
| clinical_knowledge | 0 | none | 5 | acc | 0.2377 | ± | 0.0262 |
| business_ethics | 0 | none | 5 | acc | 0.2100 | ± | 0.0409 |
| astronomy | 0 | none | 5 | acc | 0.1776 | ± | 0.0311 |
| anatomy | 0 | none | 5 | acc | 0.2593 | ± | 0.0379 |
| abstract_algebra | 0 | none | 5 | acc | 0.2200 | ± | 0.0416 |
Summary
Model Examination [optional]
its ok
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: A6000
- Hours used: 34.74
- Cloud Provider: n/a
- Compute Region iowa
- Carbon Emitted: 4.5kg CO2eq.
Technical Specifications [optional]
Model Architecture and Objective
mistral, causal language modelling
Compute Infrastructure
what
Hardware
lambda vector 2xA6000
Software
huggingface transformers / pytorch / custom trainer
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 1,733