Kudod
/

diemlnt-qwen-1.5B-wiki-1.0

Text Generation

text-generation-inference

Model card Files Files and versions Community

Kudod commited on 17 days ago

Commit

fe94ac2

·

verified ·

1 Parent(s): d035980

Update README.md

Files changed (1) hide show

README.md +36 -0

README.md CHANGED Viewed

@@ -25,6 +25,42 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 - **License:** [More Information Needed]
 - **Finetuned from model:** Qwen/Qwen2.5-1.5B.
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->

 - **License:** [More Information Needed]
 - **Finetuned from model:** Qwen/Qwen2.5-1.5B.
+## 🏗️ Model Architecture
+- **Model Type**: `qwen2`
+- **Architecture**: `Qwen2ForCausalLM`
+- **Number of Hidden Layers**: `28`
+- **Hidden Size**: `1536`
+- **Intermediate Size**: `8960`
+- **Number of Attention Heads**: `12`
+- **Number of Key-Value Heads**: `2`
+- **Activation Function**: `silu`
+- **Attention Dropout**: `0.0`
+- **RMS Norm Epsilon**: `1e-6`
+## 📏 Positional Embeddings
+- **Max Position Embeddings**: `131072`
+- **Sliding Window Size**: `131072`
+- **Max Window Layers**: `28`
+- **Rotary Embedding Theta (RoPE θ)**: `1000000.0`
+- **Use Multi-Scale RoPE (mRoPE)**: `false`
+- **Use Sliding Window**: `false`
+## 🔠 Vocabulary & Tokens
+- **Vocabulary Size**: `151936`
+- **BOS (Begin of Sentence) Token ID**: `151643`
+- **EOS (End of Sentence) Token ID**: `151643`
+- **Tied Word Embeddings**: `true`
+## ⚙️ Runtime & Training Settings
+- **Initializer Range**: `0.02`
+- **Use Cache**: `true`
+- **Torch Dtype**: `float32`
+- **Transformers Version**: `4.50.0`
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->