metadata

language: en
tags:
  - pytorch
  - tensorflow
  - text-generation
  - language-model
  - moe
  - transformer
  - causal-lm
license: mit
datasets:
  - project-gutenberg
metrics:
  - perplexity
model-index:
  - name: MiniGPT-MoE
    results:
      - task:
          type: text-generation
        dataset:
          type: project-gutenberg
          name: Project Gutenberg Books Corpus
        metrics:
          - type: perplexity
            value: 134
pipeline_tag: text-generation

MiniGPT-MoE: Lightweight Language Model with Mixture of Experts

A lightweight implementation of a GPT-style language model using TensorFlow, featuring Mixture of Experts (MoE) architecture for efficient computation.

Model Details

Architecture: Transformer with Mixture of Experts (MoE)
Total Parameters: 52.8M
Framework: TensorFlow 2.x
Training: Project Gutenberg books corpus with ByteLevel BPE tokenization
Model Type: Causal Language Model

Architecture Specifications

Embedding Dimension: 512
Number of Layers: 8 Transformer blocks
Attention Heads: 8
Feed-forward Dimension: 2048
Number of Experts: 4 (in MoE layers)
MoE Layers: Layers 2, 4, 6
Vocabulary Size: 10,000
Max Sequence Length: 256
Positional Embeddings: Rotary Positional Embeddings (RoPE)

Usage

Loading the Model

from minigpt_transformer import MoEMiniGPT, MoEConfig

# Load configuration
config = MoEConfig(
    vocab_size=10000,
    max_seq_len=256,
    embed_dim=512,
    num_heads=8,
    num_layers=8,
    ffn_dim=2048,
    num_experts=4,
    top_k_experts=1,
    use_moe_layers=[2, 4, 6]
)

# Create model
model = MoEMiniGPT(config, tokenizer_path="my-10k-bpe-tokenizer")

# Load trained weights
model.load_weights("moe_minigpt.weights.h5")

Text Generation

# Generate text
response = model.generate_text("Hello, how are you?", max_length=50)
print(response)

Training

# Train the model
python train_minigpt.py

Training Details

Dataset: Project Gutenberg books corpus (Alice in Wonderland, Pride and Prejudice, Frankenstein, Sherlock Holmes, Moby Dick, A Tale of Two Cities, Metamorphosis, War and Peace, The Adventures of Tom Sawyer, Great Expectations)
Tokenization: ByteLevel BPE with 10k vocabulary
Batch Size: 48
Learning Rate: 2e-4
Optimizer: Adam
Loss: Sparse Categorical Crossentropy with auxiliary MoE losses

Model Performance

Perplexity: ~134 (achieved in 1.1 epochs)
Training Tokens: 2M+
Expert Utilization: Balanced across 4 experts

Files

moe_minigpt.weights.h5: Trained model weights
minigpt_transformer.py: Model architecture implementation
train_minigpt.py: Training script
train_tokenizer.py: Tokenizer training script
my-10k-bpe-tokenizer/: Pre-trained tokenizer files

Citation

If you use this model in your research, please cite:

@misc{minigpt-moe,
  title={MiniGPT-MoE: Lightweight Language Model with Mixture of Experts},
  author={Devansh0711},
  year={2024},
  url={https://github.com/Devansh070/Language_model}
}

License

This model is released under the MIT License.

Acknowledgments

Built with TensorFlow and Keras
Uses HuggingFace tokenizers
Inspired by modern transformer architectures with MoE