yasserrmd
/

RSCaLM-138M-LLaMA

Model card Files Files and versions

yasserrmd commited on Aug 12

Commit

f1c3e65

·

verified ·

1 Parent(s): 7b4ae2f

Update README.md

Files changed (1) hide show

README.md +13 -3

README.md CHANGED Viewed

@@ -15,11 +15,21 @@ This run was conducted purely for **experimental and benchmarking purposes** —
 ## 📌 Experiment Summary
 * **Architecture:** LLaMA-style causal decoder
 * **Parameter Count:** \~138M
-* **Training Steps:** 20,000
-* **Purpose:** Early-stage test run for verifying training pipeline & scaling behavior
 * **Tokenizer:** LLaMA tokenizer
-* **Framework:** PyTorch + Hugging Face Transformers
 ---

 ## 📌 Experiment Summary
 * **Architecture:** LLaMA-style causal decoder
+  * Rotary positional embeddings (RoPE)
+  * Pre-normalization with RMSNorm
+  * SwiGLU feed-forward layers
+  * Multi-head self-attention with key-value caching support
 * **Parameter Count:** \~138M
+* **Context Length:** 2048 tokens
 * **Tokenizer:** LLaMA tokenizer
+* **Training Framework:** PyTorch + Hugging Face Transformers
+* **Optimizer:** AdamW (β1=0.9, β2=0.95, weight decay=0.1)
+* **Scheduler:** Cosine decay with warmup
+* **Precision:** Mixed-precision (FP16/BF16)
+* **Batching:** Gradient accumulation to simulate large batch size
+* **Dataset:** General text corpus for pipeline validation (not domain-specific)
+* **Steps Completed:** 20,000 (\~32% of planned total)
 ---