MiniGPT-MoE: Lightweight Language Model with Mixture of Experts
A lightweight implementation of a GPT-style language model using TensorFlow, featuring Mixture of Experts (MoE) architecture for efficient computation.
Model Details
- Architecture: Transformer with Mixture of Experts (MoE)
- Total Parameters: 52.8M
- Framework: TensorFlow 2.x
- Training: Project Gutenberg books corpus with ByteLevel BPE tokenization
- Model Type: Causal Language Model
Architecture Specifications
- Embedding Dimension: 512
- Number of Layers: 8 Transformer blocks
- Attention Heads: 8
- Feed-forward Dimension: 2048
- Number of Experts: 4 (in MoE layers)
- MoE Layers: Layers 2, 4, 6
- Vocabulary Size: 10,000
- Max Sequence Length: 256
- Positional Embeddings: Rotary Positional Embeddings (RoPE)
Usage
Loading the Model
from minigpt_transformer import MoEMiniGPT, MoEConfig
# Load configuration
config = MoEConfig(
vocab_size=10000,
max_seq_len=256,
embed_dim=512,
num_heads=8,
num_layers=8,
ffn_dim=2048,
num_experts=4,
top_k_experts=1,
use_moe_layers=[2, 4, 6]
)
# Create model
model = MoEMiniGPT(config, tokenizer_path="my-10k-bpe-tokenizer")
# Load trained weights
model.load_weights("moe_minigpt.weights.h5")
Text Generation
# Generate text
response = model.generate_text("Hello, how are you?", max_length=50)
print(response)
Training
# Train the model
python train_minigpt.py
Training Details
- Dataset: Project Gutenberg books corpus (Alice in Wonderland, Pride and Prejudice, Frankenstein, Sherlock Holmes, Moby Dick, A Tale of Two Cities, Metamorphosis, War and Peace, The Adventures of Tom Sawyer, Great Expectations)
- Tokenization: ByteLevel BPE with 10k vocabulary
- Batch Size: 48
- Learning Rate: 2e-4
- Optimizer: Adam
- Loss: Sparse Categorical Crossentropy with auxiliary MoE losses
Model Performance
- Perplexity: ~134 (achieved in 1.1 epochs)
- Training Tokens: 2M+
- Expert Utilization: Balanced across 4 experts
Files
moe_minigpt.weights.h5
: Trained model weightsminigpt_transformer.py
: Model architecture implementationtrain_minigpt.py
: Training scripttrain_tokenizer.py
: Tokenizer training scriptmy-10k-bpe-tokenizer/
: Pre-trained tokenizer files
Citation
If you use this model in your research, please cite:
@misc{minigpt-moe,
title={MiniGPT-MoE: Lightweight Language Model with Mixture of Experts},
author={Devansh0711},
year={2024},
url={https://github.com/Devansh070/Language_model}
}
License
This model is released under the MIT License.
Acknowledgments
- Built with TensorFlow and Keras
- Uses HuggingFace tokenizers
- Inspired by modern transformer architectures with MoE
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Evaluation results
- perplexity on Project Gutenberg Books Corpusself-reported134.000