yasserrmd commited on
Commit
f1c3e65
·
verified ·
1 Parent(s): 7b4ae2f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -3
README.md CHANGED
@@ -15,11 +15,21 @@ This run was conducted purely for **experimental and benchmarking purposes** —
15
  ## 📌 Experiment Summary
16
 
17
  * **Architecture:** LLaMA-style causal decoder
 
 
 
 
 
18
  * **Parameter Count:** \~138M
19
- * **Training Steps:** 20,000
20
- * **Purpose:** Early-stage test run for verifying training pipeline & scaling behavior
21
  * **Tokenizer:** LLaMA tokenizer
22
- * **Framework:** PyTorch + Hugging Face Transformers
 
 
 
 
 
 
23
 
24
  ---
25
 
 
15
  ## 📌 Experiment Summary
16
 
17
  * **Architecture:** LLaMA-style causal decoder
18
+
19
+ * Rotary positional embeddings (RoPE)
20
+ * Pre-normalization with RMSNorm
21
+ * SwiGLU feed-forward layers
22
+ * Multi-head self-attention with key-value caching support
23
  * **Parameter Count:** \~138M
24
+ * **Context Length:** 2048 tokens
 
25
  * **Tokenizer:** LLaMA tokenizer
26
+ * **Training Framework:** PyTorch + Hugging Face Transformers
27
+ * **Optimizer:** AdamW (β1=0.9, β2=0.95, weight decay=0.1)
28
+ * **Scheduler:** Cosine decay with warmup
29
+ * **Precision:** Mixed-precision (FP16/BF16)
30
+ * **Batching:** Gradient accumulation to simulate large batch size
31
+ * **Dataset:** General text corpus for pipeline validation (not domain-specific)
32
+ * **Steps Completed:** 20,000 (\~32% of planned total)
33
 
34
  ---
35