Kudod commited on
Commit
fe94ac2
·
verified ·
1 Parent(s): d035980

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md CHANGED
@@ -25,6 +25,42 @@ This is the model card of a 🤗 transformers model that has been pushed on the
25
  - **License:** [More Information Needed]
26
  - **Finetuned from model:** Qwen/Qwen2.5-1.5B.
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ### Model Sources [optional]
29
 
30
  <!-- Provide the basic links for the model. -->
 
25
  - **License:** [More Information Needed]
26
  - **Finetuned from model:** Qwen/Qwen2.5-1.5B.
27
 
28
+ ## 🏗️ Model Architecture
29
+
30
+ - **Model Type**: `qwen2`
31
+ - **Architecture**: `Qwen2ForCausalLM`
32
+ - **Number of Hidden Layers**: `28`
33
+ - **Hidden Size**: `1536`
34
+ - **Intermediate Size**: `8960`
35
+ - **Number of Attention Heads**: `12`
36
+ - **Number of Key-Value Heads**: `2`
37
+ - **Activation Function**: `silu`
38
+ - **Attention Dropout**: `0.0`
39
+ - **RMS Norm Epsilon**: `1e-6`
40
+
41
+ ## 📏 Positional Embeddings
42
+
43
+ - **Max Position Embeddings**: `131072`
44
+ - **Sliding Window Size**: `131072`
45
+ - **Max Window Layers**: `28`
46
+ - **Rotary Embedding Theta (RoPE θ)**: `1000000.0`
47
+ - **Use Multi-Scale RoPE (mRoPE)**: `false`
48
+ - **Use Sliding Window**: `false`
49
+
50
+ ## 🔠 Vocabulary & Tokens
51
+
52
+ - **Vocabulary Size**: `151936`
53
+ - **BOS (Begin of Sentence) Token ID**: `151643`
54
+ - **EOS (End of Sentence) Token ID**: `151643`
55
+ - **Tied Word Embeddings**: `true`
56
+
57
+ ## ⚙️ Runtime & Training Settings
58
+
59
+ - **Initializer Range**: `0.02`
60
+ - **Use Cache**: `true`
61
+ - **Torch Dtype**: `float32`
62
+ - **Transformers Version**: `4.50.0`
63
+
64
  ### Model Sources [optional]
65
 
66
  <!-- Provide the basic links for the model. -->