yasserrmd commited on
Commit
4fa6479
Β·
verified Β·
1 Parent(s): 61f42e4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +128 -0
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - HuggingFaceFW/fineweb-edu
4
+ ---
5
+
6
+ # RSCaLM-138M-core
7
+
8
+ **RSCaLM** (**Research Scale Causal Language Model**) β€” *Core Edition* β€” is an **experimental 138M-parameter decoder-only transformer** trained for **20,000 steps**.
9
+ Unlike the LLaMA variant, this model is implemented entirely with a **custom minimal GPT architecture** (`standalone_transformer_lm.GPT`) and **SentencePiece** tokenization β€” no Hugging Face Transformers dependency.
10
+
11
+ ---
12
+
13
+ ## πŸ“Œ Experiment Summary
14
+
15
+ * **Architecture:** Custom GPT-style causal decoder
16
+
17
+ * Implemented in `standalone_transformer_lm.py`
18
+ * Learned positional embeddings (absolute)
19
+ * Multi-head self-attention with KV caching
20
+ * GELU feed-forward layers
21
+ * LayerNorm
22
+ * **Parameter Count:** \~138M
23
+ * **Context Length:** 2048 tokens
24
+ * **Tokenizer:** SentencePiece (`tokenizer.model`)
25
+ * **Training Framework:** Pure PyTorch (no Transformers)
26
+ * **Optimizer:** AdamW (Ξ²1=0.9, Ξ²2=0.95, weight decay=0.1)
27
+ * **Scheduler:** Cosine decay with warmup
28
+ * **Precision:** Mixed FP16/BF16 training
29
+ * **Steps Completed:** 20,000 (\~32% of planned total)
30
+
31
+ ---
32
+
33
+ ## πŸ“‰ Validation Loss Progress
34
+
35
+ | Step | Val Loss |
36
+ | ------ | -------- |
37
+ | 1,000 | 5.6011 |
38
+ | 2,000 | 4.8598 |
39
+ | 5,000 | 4.2239 |
40
+ | 10,000 | 3.9756 |
41
+ | 15,000 | 3.8608 |
42
+ | 20,000 | 3.7984 |
43
+
44
+ ---
45
+
46
+ ## ⚠️ Notes
47
+
48
+ * **Prototype only** β€” repetition loops expected in longer generations.
49
+ * Requires **`standalone_transformer_lm.py`** and **SentencePiece** to run.
50
+ * Does **not** load with `transformers.AutoModelForCausalLM`.
51
+
52
+ ---
53
+
54
+ ## πŸ”§ Example Usage
55
+
56
+ ```python
57
+ import torch, sentencepiece as spm
58
+ from standalone_transformer_lm import GPT, GPTConfig
59
+
60
+ # Load checkpoint & config
61
+ ckpt = torch.load("ckpt_best.pt", map_location="cpu")
62
+ cfg = GPTConfig(**ckpt["config"])
63
+
64
+ # Init model & load weights
65
+ model = GPT(cfg).eval()
66
+ model.load_state_dict(ckpt["model"])
67
+
68
+ # Load tokenizer
69
+ sp = spm.SentencePieceProcessor()
70
+ sp.load("tokenizer.model")
71
+
72
+ # Encode prompt
73
+ ids = torch.tensor([sp.encode("Dubai is", out_type=int)])
74
+
75
+ # Generate text
76
+ out = model.generate(ids, max_new_tokens=40)
77
+ print(sp.decode(out[0].tolist()))
78
+ ```
79
+
80
+ ---
81
+
82
+ ## πŸ”§ Example Usage (with repetition control)
83
+
84
+ ```python
85
+ import torch, sentencepiece as spm
86
+ from standalone_transformer_lm import GPT, GPTConfig
87
+
88
+ ckpt = torch.load("ckpt_best.pt", map_location="cpu")
89
+ cfg = GPTConfig(**ckpt["config"])
90
+ model = GPT(cfg).eval()
91
+ model.load_state_dict(ckpt["model"])
92
+
93
+ sp = spm.SentencePieceProcessor()
94
+ sp.load("tokenizer.model")
95
+
96
+ prompt = "when a man goes to fishing"
97
+ ids = torch.tensor([sp.encode(prompt, out_type=int)])
98
+
99
+ # Manual repetition control
100
+ out = model.generate(
101
+ ids,
102
+ max_new_tokens=100,
103
+ temperature=0.7, # Lower temp = more focused
104
+ top_k=50, # Top-K sampling
105
+ top_p=0.9, # Nucleus sampling
106
+ repetition_penalty=1.2, # Penalize repeats
107
+ no_repeat_ngram_size=3, # Block repeating trigrams
108
+ )
109
+ print(sp.decode(out[0].tolist()))
110
+ ```
111
+
112
+ ---
113
+
114
+ ### πŸ’‘ Tips to Reduce Loops
115
+
116
+ * Increase `repetition_penalty` to 1.2–1.5
117
+ * Use `no_repeat_ngram_size=3` or higher
118
+ * Combine `top_k` and `top_p` for better sampling variety
119
+ * Lower `temperature` for more deterministic completions
120
+
121
+ ---
122
+
123
+ ## πŸ“œ License
124
+
125
+ Apache-2.0
126
+
127
+ ---
128
+