Update README.md
Browse files
README.md
CHANGED
@@ -15,11 +15,21 @@ This run was conducted purely for **experimental and benchmarking purposes** —
|
|
15 |
## 📌 Experiment Summary
|
16 |
|
17 |
* **Architecture:** LLaMA-style causal decoder
|
|
|
|
|
|
|
|
|
|
|
18 |
* **Parameter Count:** \~138M
|
19 |
-
* **
|
20 |
-
* **Purpose:** Early-stage test run for verifying training pipeline & scaling behavior
|
21 |
* **Tokenizer:** LLaMA tokenizer
|
22 |
-
* **Framework:** PyTorch + Hugging Face Transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
---
|
25 |
|
|
|
15 |
## 📌 Experiment Summary
|
16 |
|
17 |
* **Architecture:** LLaMA-style causal decoder
|
18 |
+
|
19 |
+
* Rotary positional embeddings (RoPE)
|
20 |
+
* Pre-normalization with RMSNorm
|
21 |
+
* SwiGLU feed-forward layers
|
22 |
+
* Multi-head self-attention with key-value caching support
|
23 |
* **Parameter Count:** \~138M
|
24 |
+
* **Context Length:** 2048 tokens
|
|
|
25 |
* **Tokenizer:** LLaMA tokenizer
|
26 |
+
* **Training Framework:** PyTorch + Hugging Face Transformers
|
27 |
+
* **Optimizer:** AdamW (β1=0.9, β2=0.95, weight decay=0.1)
|
28 |
+
* **Scheduler:** Cosine decay with warmup
|
29 |
+
* **Precision:** Mixed-precision (FP16/BF16)
|
30 |
+
* **Batching:** Gradient accumulation to simulate large batch size
|
31 |
+
* **Dataset:** General text corpus for pipeline validation (not domain-specific)
|
32 |
+
* **Steps Completed:** 20,000 (\~32% of planned total)
|
33 |
|
34 |
---
|
35 |
|