StepLaw
/

StepLaw-N_429M-D_39.0B-LR2.76E-03-BS1048576

@@ -23,17 +23,17 @@ This model is part of the [StepLaw-N_429M-D_39.0B](https://huggingface.co/collec
 - **Feed-forward network size (FFN)**: 9472
 - **Attention heads**: 10
 - **Layers**: 10
-- **Parameter count**: 429MM
 ### Training Parameters
 - **Learning rate (lr)**: 2.76E-03
-- **Batch size (bs)**: 512
 - **Training iterations**: 38146
 - **Training tokens (D)**: 40.0B
 ## Model Description
-StepLaw models are trained with various hyperparameter settings to enable research on scaling laws and hyperparameter optimization. This specific model was trained with learning rate 2.76E-03 and batch size 512 for 38146 iterations, using a total of 40.0B training tokens.
 ## Usage Example
@@ -48,7 +48,4 @@ model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
 inputs = tokenizer("A long time ago in a galaxy far, far away", return_tensors="pt")
 outputs = model.generate(**inputs, max_length=100)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```## Part of StepLaw Project
-StepLaw is an initiative to provide thousands of models for optimal hyperparameter research.
-Visit [StepLaw Project](https://step-law.github.io/) for more information.

 - **Feed-forward network size (FFN)**: 9472
 - **Attention heads**: 10
 - **Layers**: 10
+- **Parameter count**: 429M
 ### Training Parameters
 - **Learning rate (lr)**: 2.76E-03
+- **Batch size (bs)**: 1048576
 - **Training iterations**: 38146
 - **Training tokens (D)**: 40.0B
 ## Model Description
+StepLaw models are trained with various hyperparameter settings to enable research on scaling laws and hyperparameter optimization. This specific model was trained with learning rate 2.76E-03 and batch size 1048576 for 38146 iterations, using a total of 40.0B training tokens.
 ## Usage Example
 inputs = tokenizer("A long time ago in a galaxy far, far away", return_tensors="pt")
 outputs = model.generate(**inputs, max_length=100)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```