metadata
language: en
tags:
- pytorch
- tensorflow
- text-generation
- language-model
- moe
- transformer
- causal-lm
license: mit
datasets:
- project-gutenberg
metrics:
- perplexity
model-index:
- name: MiniGPT-MoE
results:
- task:
type: text-generation
dataset:
type: project-gutenberg
name: Project Gutenberg Books Corpus
metrics:
- type: perplexity
value: 134
pipeline_tag: text-generation
MiniGPT-MoE: Lightweight Language Model with Mixture of Experts
A lightweight implementation of a GPT-style language model using TensorFlow, featuring Mixture of Experts (MoE) architecture for efficient computation.
Model Details
- Architecture: Transformer with Mixture of Experts (MoE)
- Total Parameters: 52.8M
- Framework: TensorFlow 2.x
- Training: Project Gutenberg books corpus with ByteLevel BPE tokenization
- Model Type: Causal Language Model
Architecture Specifications
- Embedding Dimension: 512
- Number of Layers: 8 Transformer blocks
- Attention Heads: 8
- Feed-forward Dimension: 2048
- Number of Experts: 4 (in MoE layers)
- MoE Layers: Layers 2, 4, 6
- Vocabulary Size: 10,000
- Max Sequence Length: 256
- Positional Embeddings: Rotary Positional Embeddings (RoPE)
Usage
Loading the Model
from minigpt_transformer import MoEMiniGPT, MoEConfig
# Load configuration
config = MoEConfig(
vocab_size=10000,
max_seq_len=256,
embed_dim=512,
num_heads=8,
num_layers=8,
ffn_dim=2048,
num_experts=4,
top_k_experts=1,
use_moe_layers=[2, 4, 6]
)
# Create model
model = MoEMiniGPT(config, tokenizer_path="my-10k-bpe-tokenizer")
# Load trained weights
model.load_weights("moe_minigpt.weights.h5")
Text Generation
# Generate text
response = model.generate_text("Hello, how are you?", max_length=50)
print(response)
Training
# Train the model
python train_minigpt.py
Training Details
- Dataset: Project Gutenberg books corpus (Alice in Wonderland, Pride and Prejudice, Frankenstein, Sherlock Holmes, Moby Dick, A Tale of Two Cities, Metamorphosis, War and Peace, The Adventures of Tom Sawyer, Great Expectations)
- Tokenization: ByteLevel BPE with 10k vocabulary
- Batch Size: 48
- Learning Rate: 2e-4
- Optimizer: Adam
- Loss: Sparse Categorical Crossentropy with auxiliary MoE losses
Model Performance
- Perplexity: ~134 (achieved in 1.1 epochs)
- Training Tokens: 2M+
- Expert Utilization: Balanced across 4 experts
Files
moe_minigpt.weights.h5
: Trained model weightsminigpt_transformer.py
: Model architecture implementationtrain_minigpt.py
: Training scripttrain_tokenizer.py
: Tokenizer training scriptmy-10k-bpe-tokenizer/
: Pre-trained tokenizer files
Citation
If you use this model in your research, please cite:
@misc{minigpt-moe,
title={MiniGPT-MoE: Lightweight Language Model with Mixture of Experts},
author={Devansh0711},
year={2024},
url={https://github.com/Devansh070/Language_model}
}
License
This model is released under the MIT License.
Acknowledgments
- Built with TensorFlow and Keras
- Uses HuggingFace tokenizers
- Inspired by modern transformer architectures with MoE