Model Card for Model ID

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: me
Model type: Mistral
Language(s) (NLP): en
License: apache

Uses

general web text completions at extremely low resource use

Out-of-Scope Use

not an instruct model

Bias, Risks, and Limitations

trained on web text, though filtered no guarantees theres not toxic stuff in there

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("crumb/nano-mistral")
tokenizer = AutoTokenizer.from_pretrained("crumb/nano-mistral")

inputs = tokenizer(["Once upon a time,"], return_tensors="pt")
inputs = {k:v.to(model.device) for k,v in dict(inputs).items()}
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.7, top_k=20, do_sample=True)
outputs = tokenizer.batch_decode(outputs)
for i in outputs:
  print(i)

Training Details

Training Data

crumb/askmistral-pile-2-15

Training Procedure

Parameter	Value
Context Length	2048
Batch Size	128
Learning Rate	6e-4
Scheduler	One-Cycle
Adam eps	1e-8
Adam beta1	0.9
Adam beta2	0.95
Weight Decay	0.1
Max Grad Norm	1.0
Optimizer	adamw_torch
Tokens	3,401,640,960

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: bf16 non-mixed precision

Speeds, Sizes, Times [optional]

train_runtime 62541.9424

train_samples_per_second 26.557

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

held out set of crumb/askmistral-pile-2-15

Factors

[More Information Needed]

Metrics

open llm leaderboard eval datasets and settings

Results

OpenLLM Leaderboard Mean Score + Stderr: (29.30, 0.42)

Tasks	Version	Filter	n-shot	Metric	Value		Stderr
arc_challenge	1	none	25	acc	0.1843	±	0.0113
		none	25	acc_norm	0.2167	±	0.0120
truthfulqa_mc2	2	none	0	acc	0.4719	±	0.0156
winogrande	1	none	5	acc	0.517	±	0.014
hellaswag	1	none	10	acc	0.2803	±	0.0045
		none	10	acc_norm	0.2886	±	0.0045
gsm8k	3	strict-match	5	exact_match	0.0008	±	0.0008
		flexible-extract	5	exact_match	0.0099	±	0.0027

MMLU

value, stderr = (0.253980701754386, 0.004428598058450528)

Tasks	Filter	n-shot	Metric	Value		Stderr
world_religions	none	5	acc	0.2222	±	0.0319
virology	none	5	acc	0.2711	±	0.0346
us_foreign_policy	none	5	acc	0.3300	±	0.0473
sociology	none	5	acc	0.2388	±	0.0301
security_studies	none	5	acc	0.2367	±	0.0272
public_relations	none	5	acc	0.2273	±	0.0401
professional_psychology	none	5	acc	0.2484	±	0.0175
professional_medicine	none	5	acc	0.4596	±	0.0303
professional_law	none	5	acc	0.2464	±	0.0110
professional_accounting	none	5	acc	0.2021	±	0.0240
prehistory	none	5	acc	0.2130	±	0.0228
philosophy	none	5	acc	0.2219	±	0.0236
nutrition	none	5	acc	0.2157	±	0.0236
moral_scenarios	none	5	acc	0.2380	±	0.0142
moral_disputes	none	5	acc	0.2486	±	0.0233
miscellaneous	none	5	acc	0.2516	±	0.0155
medical_genetics	none	5	acc	0.3000	±	0.0461
marketing	none	5	acc	0.2265	±	0.0274
management	none	5	acc	0.1748	±	0.0376
machine_learning	none	5	acc	0.3125	±	0.0440
logical_fallacies	none	5	acc	0.2393	±	0.0335
jurisprudence	none	5	acc	0.2315	±	0.0408
international_law	none	5	acc	0.3140	±	0.0424
human_sexuality	none	5	acc	0.2519	±	0.0381
human_aging	none	5	acc	0.3049	±	0.0309
high_school_world_history	none	5	acc	0.2658	±	0.0288
high_school_us_history	none	5	acc	0.2451	±	0.0302
high_school_statistics	none	5	acc	0.4722	±	0.0340
high_school_psychology	none	5	acc	0.1963	±	0.0170
high_school_physics	none	5	acc	0.3046	±	0.0376
high_school_microeconomics	none	5	acc	0.2773	±	0.0291
high_school_mathematics	none	5	acc	0.2667	±	0.0270
high_school_macroeconomics	none	5	acc	0.2667	±	0.0224
high_school_government_and_politics	none	5	acc	0.2591	±	0.0316
high_school_geography	none	5	acc	0.2424	±	0.0305
high_school_european_history	none	5	acc	0.2242	±	0.0326
high_school_computer_science	none	5	acc	0.2800	±	0.0451
high_school_chemistry	none	5	acc	0.2857	±	0.0318
high_school_biology	none	5	acc	0.3129	±	0.0264
global_facts	none	5	acc	0.1500	±	0.0359
formal_logic	none	5	acc	0.1905	±	0.0351
elementary_mathematics	none	5	acc	0.2513	±	0.0223
electrical_engineering	none	5	acc	0.2759	±	0.0372
econometrics	none	5	acc	0.2456	±	0.0405
conceptual_physics	none	5	acc	0.2638	±	0.0288
computer_security	none	5	acc	0.1800	±	0.0386
college_physics	none	5	acc	0.2549	±	0.0434
college_medicine	none	5	acc	0.2023	±	0.0306
college_mathematics	none	5	acc	0.2900	±	0.0456
college_computer_science	none	5	acc	0.2700	±	0.0446
college_chemistry	none	5	acc	0.2500	±	0.0435
college_biology	none	5	acc	0.2222	±	0.0348
clinical_knowledge	none	5	acc	0.2377	±	0.0262
business_ethics	none	5	acc	0.2100	±	0.0409
astronomy	none	5	acc	0.1776	±	0.0311
anatomy	none	5	acc	0.2593	±	0.0379
abstract_algebra	none	5	acc	0.2200	±	0.0416

Summary

Model Examination [optional]

its ok

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: A6000
Hours used: 34.74
Cloud Provider: n/a
Compute Region iowa
Carbon Emitted: 4.5kg CO2eq.

Technical Specifications [optional]

Model Architecture and Objective

mistral, causal language modelling

Compute Infrastructure

what

Hardware

lambda vector 2xA6000

Software

huggingface transformers / pytorch / custom trainer

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Downloads last month: 1,733

Safetensors

Model size

0.2B params

Tensor type

BF16

Model tree for crumb/nano-mistral

Quantizations

3 models

Dataset used to train crumb/nano-mistral

Paper for crumb/nano-mistral

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 48