File size: 2,132 Bytes
e7d0774
fff6e88
 
e7d0774
 
 
fff6e88
a5eca03
e7d0774
 
 
 
fff6e88
aa9bad5
 
 
 
035761e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fceb8da
 
035761e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
---
title: SmolLM2-135M
emoji: πŸš€
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: "5.13.1"
app_file: app.py
pinned: false
---

---
<!-- training logs -->
training restarting from step 5000 
![Training Log](training_log_smollm2.png) 
<!-- add image to README.md -->
<!-- use venv to create a virtual environment -->
```
uv venv 
source .venv/bin/activate
```
<!-- Train smollm2 model -->
use dataset from https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus/tree/main/cosmopedia-v2
```
dataset = load_dataset("HuggingFaceTB/smollm-corpus", "cosmopedia-v2")
```

use tokeniser from https://huggingface.co/HuggingFaceTB/cosmo2-tokenizer
```
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/cosmo2-tokenizer")
```
use config from https://huggingface.co/HuggingFaceTB/SmolLM2-135M/blob/main/config_smollm2_135M.yaml

https://github.com/huggingface/smollm/blob/main/pre-training/smollm2/config_smollm2_135M.yaml

create model from above parameters

Use it for training using pytorch lightning 

<!-- Model architecture -->

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(49152, 576)
    (layers): ModuleList(
      (0-29): 30 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=576, out_features=576, bias=False)
          (k_proj): Linear(in_features=576, out_features=192, bias=False)
          (v_proj): Linear(in_features=576, out_features=192, bias=False)
          (o_proj): Linear(in_features=576, out_features=576, bias=False)
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=576, out_features=1536, bias=False)
          (up_proj): Linear(in_features=576, out_features=1536, bias=False)
          (down_proj): Linear(in_features=1536, out_features=576, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((576,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((576,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((576,), eps=1e-05)
    (rotary_emb): LlamaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=576, out_features=49152, bias=False)
)