Spaces:
Running
Running
File size: 2,132 Bytes
e7d0774 fff6e88 e7d0774 fff6e88 a5eca03 e7d0774 fff6e88 aa9bad5 035761e fceb8da 035761e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
---
---
title: SmolLM2-135M
emoji: π
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: "5.13.1"
app_file: app.py
pinned: false
---
---
<!-- training logs -->
training restarting from step 5000
![Training Log](training_log_smollm2.png)
<!-- add image to README.md -->
<!-- use venv to create a virtual environment -->
```
uv venv
source .venv/bin/activate
```
<!-- Train smollm2 model -->
use dataset from https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus/tree/main/cosmopedia-v2
```
dataset = load_dataset("HuggingFaceTB/smollm-corpus", "cosmopedia-v2")
```
use tokeniser from https://huggingface.co/HuggingFaceTB/cosmo2-tokenizer
```
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/cosmo2-tokenizer")
```
use config from https://huggingface.co/HuggingFaceTB/SmolLM2-135M/blob/main/config_smollm2_135M.yaml
https://github.com/huggingface/smollm/blob/main/pre-training/smollm2/config_smollm2_135M.yaml
create model from above parameters
Use it for training using pytorch lightning
<!-- Model architecture -->
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(49152, 576)
(layers): ModuleList(
(0-29): 30 x LlamaDecoderLayer(
(self_attn): LlamaAttention(
(q_proj): Linear(in_features=576, out_features=576, bias=False)
(k_proj): Linear(in_features=576, out_features=192, bias=False)
(v_proj): Linear(in_features=576, out_features=192, bias=False)
(o_proj): Linear(in_features=576, out_features=576, bias=False)
)
(mlp): LlamaMLP(
(gate_proj): Linear(in_features=576, out_features=1536, bias=False)
(up_proj): Linear(in_features=576, out_features=1536, bias=False)
(down_proj): Linear(in_features=1536, out_features=576, bias=False)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm((576,), eps=1e-05)
(post_attention_layernorm): LlamaRMSNorm((576,), eps=1e-05)
)
)
(norm): LlamaRMSNorm((576,), eps=1e-05)
(rotary_emb): LlamaRotaryEmbedding()
)
(lm_head): Linear(in_features=576, out_features=49152, bias=False)
) |