minigpt-moe / README.md
Devansh0711's picture
Upload README.md with huggingface_hub
27a7d93 verified
metadata
language: en
tags:
  - pytorch
  - tensorflow
  - text-generation
  - language-model
  - moe
  - transformer
  - causal-lm
license: mit
datasets:
  - project-gutenberg
metrics:
  - perplexity
model-index:
  - name: MiniGPT-MoE
    results:
      - task:
          type: text-generation
        dataset:
          type: project-gutenberg
          name: Project Gutenberg Books Corpus
        metrics:
          - type: perplexity
            value: 134
pipeline_tag: text-generation

MiniGPT-MoE: Lightweight Language Model with Mixture of Experts

A lightweight implementation of a GPT-style language model using TensorFlow, featuring Mixture of Experts (MoE) architecture for efficient computation.

Model Details

  • Architecture: Transformer with Mixture of Experts (MoE)
  • Total Parameters: 52.8M
  • Framework: TensorFlow 2.x
  • Training: Project Gutenberg books corpus with ByteLevel BPE tokenization
  • Model Type: Causal Language Model

Architecture Specifications

  • Embedding Dimension: 512
  • Number of Layers: 8 Transformer blocks
  • Attention Heads: 8
  • Feed-forward Dimension: 2048
  • Number of Experts: 4 (in MoE layers)
  • MoE Layers: Layers 2, 4, 6
  • Vocabulary Size: 10,000
  • Max Sequence Length: 256
  • Positional Embeddings: Rotary Positional Embeddings (RoPE)

Usage

Loading the Model

from minigpt_transformer import MoEMiniGPT, MoEConfig

# Load configuration
config = MoEConfig(
    vocab_size=10000,
    max_seq_len=256,
    embed_dim=512,
    num_heads=8,
    num_layers=8,
    ffn_dim=2048,
    num_experts=4,
    top_k_experts=1,
    use_moe_layers=[2, 4, 6]
)

# Create model
model = MoEMiniGPT(config, tokenizer_path="my-10k-bpe-tokenizer")

# Load trained weights
model.load_weights("moe_minigpt.weights.h5")

Text Generation

# Generate text
response = model.generate_text("Hello, how are you?", max_length=50)
print(response)

Training

# Train the model
python train_minigpt.py

Training Details

  • Dataset: Project Gutenberg books corpus (Alice in Wonderland, Pride and Prejudice, Frankenstein, Sherlock Holmes, Moby Dick, A Tale of Two Cities, Metamorphosis, War and Peace, The Adventures of Tom Sawyer, Great Expectations)
  • Tokenization: ByteLevel BPE with 10k vocabulary
  • Batch Size: 48
  • Learning Rate: 2e-4
  • Optimizer: Adam
  • Loss: Sparse Categorical Crossentropy with auxiliary MoE losses

Model Performance

  • Perplexity: ~134 (achieved in 1.1 epochs)
  • Training Tokens: 2M+
  • Expert Utilization: Balanced across 4 experts

Files

  • moe_minigpt.weights.h5: Trained model weights
  • minigpt_transformer.py: Model architecture implementation
  • train_minigpt.py: Training script
  • train_tokenizer.py: Tokenizer training script
  • my-10k-bpe-tokenizer/: Pre-trained tokenizer files

Citation

If you use this model in your research, please cite:

@misc{minigpt-moe,
  title={MiniGPT-MoE: Lightweight Language Model with Mixture of Experts},
  author={Devansh0711},
  year={2024},
  url={https://github.com/Devansh070/Language_model}
}

License

This model is released under the MIT License.

Acknowledgments

  • Built with TensorFlow and Keras
  • Uses HuggingFace tokenizers
  • Inspired by modern transformer architectures with MoE