This model is a Llama architecture based model with 500m parameters created to write stories. It is pretrained for 4-5 hours on a small dataset using t4 gpu. I've got 2.9 training loss after training. This model shouldn't be used as a project itself, It must be pretrained on some larger datasets. Then, It must be post trained on conversational datasets.

License

This model is licensed under MIT.

Downloads last month
136
Safetensors
Model size
503M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train arshiaafshani/arshStory