Higher loss than jekunz/smollm-135m-cpt-fineweb-faroese, may or may not be a bit better --> More unstable in the beginning, slightly lower loss in the end.

Training:

1 Epoch
Learning rate: 8e-4
LR scheduler: Cosine
Warmup ratio: 0.05
Batch size: 1
4 A100 (40GB) GPUs
Gradient accumulation steps: 64
Effective batch size: 256
Max. context length: 8192 tokens

(renamed from smollm-135m-full-fineweb-fao-test2)

Downloads last month: 0

Safetensors

Model size

135M params

Tensor type

F32

Inference Providers NEW

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for jekunz/smollm-135m-cpt-fineweb-faroese-2

Base model

HuggingFaceTB/SmolLM2-135M

Quantized

HuggingFaceTB/SmolLM2-135M-Instruct

Finetuned

(65)

this model

Dataset used to train jekunz/smollm-135m-cpt-fineweb-faroese-2

Collection including jekunz/smollm-135m-cpt-fineweb-faroese-2

SmolLM CPT

Collection

Continued Pre-Training of SmolLM models on the Fineweb-2 portions of Scandinavian languages. • 5 items • Updated about 12 hours ago