GerbilLab

AI & ML interests

Unusable models, compute optimally 🔥 (Creating very small language models for very small research on very small hardware.)

Recent Activity

crumb updated a Space about 1 month ago

GerbilLab/README

crumb updated a model over 2 years ago

GerbilLab/IPythia-70m

crumb updated a model over 2 years ago

GerbilLab/GerbilBlender-B-star-77m

View all activity

Organization Card

Community About org cards

Evaluations and more information about the training for every Gerbil model and the mixture-of-tasks Blender pretraining method inspired by UL2 can be found here: https://github.com/aicrumb/notebook-hosting/blob/main/GerbilLabEvaluations.md

Special tokens for "Blender" models' pretraining include:

'<fitm_start>', '<multiple_tok_mask>', '<fitm_result>', '<causal>', '<mlm_start>', '<single_tok_mask>', '<mlm_end>'
# Example fill in the middle
'<fitm_start> this is an <multiple_tok_mask> for fill-in-the-middle <fitm_result> example text <|endoftext|>'
# Example causal language modelling
'<causal> this is an example text for causal language modelling <|endoftext|>'
# Example masked language modelling
'<mlm_start> this is an <single_tok_mask> text for masked language modelling <mlm_end> example <|endoftext|>'

models 24

datasets 0

None public yet

AI & ML interests

Recent Activity

Team members 1

models 24 Sort: Recently updated

datasets 0

models 24