AI & ML interests

Unusable models, compute optimally 🔥 (Creating very small language models for very small research on very small hardware.)

Recent Activity

crumb  updated a Space about 1 month ago
GerbilLab/README
crumb  updated a model over 2 years ago
GerbilLab/IPythia-70m
crumb  updated a model over 2 years ago
GerbilLab/GerbilBlender-B-star-77m
View all activity

Evaluations and more information about the training for every Gerbil model and the mixture-of-tasks Blender pretraining method inspired by UL2 can be found here: https://github.com/aicrumb/notebook-hosting/blob/main/GerbilLabEvaluations.md

Special tokens for "Blender" models' pretraining include:

'<fitm_start>', '<multiple_tok_mask>', '<fitm_result>', '<causal>', '<mlm_start>', '<single_tok_mask>', '<mlm_end>'
# Example fill in the middle
'<fitm_start> this is an <multiple_tok_mask> for fill-in-the-middle <fitm_result> example text <|endoftext|>'
# Example causal language modelling
'<causal> this is an example text for causal language modelling <|endoftext|>'
# Example masked language modelling
'<mlm_start> this is an <single_tok_mask> text for masked language modelling <mlm_end> example <|endoftext|>'

datasets 0

None public yet