
GerbilLab/IPythia-70m
Text Generation
•
0.1B
•
Updated
•
4
Unusable models, compute optimally 🔥 (Creating very small language models for very small research on very small hardware.)
Evaluations and more information about the training for every Gerbil model and the mixture-of-tasks Blender pretraining method inspired by UL2 can be found here: https://github.com/aicrumb/notebook-hosting/blob/main/GerbilLabEvaluations.md
Special tokens for "Blender" models' pretraining include:
'<fitm_start>', '<multiple_tok_mask>', '<fitm_result>', '<causal>', '<mlm_start>', '<single_tok_mask>', '<mlm_end>'
# Example fill in the middle
'<fitm_start> this is an <multiple_tok_mask> for fill-in-the-middle <fitm_result> example text <|endoftext|>'
# Example causal language modelling
'<causal> this is an example text for causal language modelling <|endoftext|>'
# Example masked language modelling
'<mlm_start> this is an <single_tok_mask> text for masked language modelling <mlm_end> example <|endoftext|>'