Model Card for LumenBase

A 128M parameter GPT-style transformer built from scratch for educational purposes, featuring Grouped Multi-Query Attention (GQA), SwiGLU, RMSNorm, and RoPE.

Model Details

Model Description

LumenBase is a decoder-only transformer language model implementing modern architectural optimizations:

  • Architecture: 12-layer transformer with GQA (12 query heads, 4 KV heads), SwiGLU activation, RMSNorm, and RoPE

  • Parameters: 128M (768 hidden size, 3072 FFN, 2048 context length)

  • Training: Mixed precision (FP16/BF16) with custom tokenizer (32K vocab)

  • Developed by: Hariom Jangra

  • Model type: Decoder-only Transformer

  • Language: English

  • License: MIT

  • Repository: https://github.com/HariomJangra/project-lumen

Uses

Direct Use:

  • Text generation and completion
  • Educational resource for understanding transformer architecture
  • Research baseline for language models
  • Foundation for fine-tuning on specific tasks

Downstream Use:

  • Instruction tuning
  • Chat applications
  • Domain-specific fine-tuning

Out-of-Scope:

  • Production deployments
  • Safety-critical applications
  • Applications requiring factual accuracy without verification
  • This is an educational model - use established frameworks for production

Limitations

Technical:

  • Limited size (128M parameters) - below state-of-the-art performance
  • 2048 token context window
  • May generate incoherent text for complex prompts

Bias & Safety:

  • May perpetuate training data biases
  • Not evaluated for fairness across demographics
  • Can generate inappropriate content
  • Should not be relied upon for factual information

Recommendations: This is an educational model. Verify all outputs, implement content filtering for applications, and use production-ready models for commercial use.

Training

Data: Custom datasets tokenized with BPE (32K vocab)

Hyperparameters:

  • Optimizer: AdamW (lr=3e-4, weight_decay=0.1)
  • Batch: 12 ร— 4 gradient accumulation = 48 effective
  • Sequence length: 2048 tokens
  • Scheduler: Linear warmup + Cosine annealing
  • Precision: Mixed (FP16/BF16/FP32)
  • Dropout: 0.1 (training), 0.0 (inference)

Training Loss

Evaluation

Evaluated on standard NLP benchmarks:

Benchmark Accuracy Correct/Total
ARC-Easy 39.48% 938/2,376
ARC-Challenge 23.55% 276/1,172
HellaSwag 32.62% 334/1,024

Summary: Baseline performance consistent with a 128M educational model. Results show capability on easier tasks with room for improvement on complex reasoning.

Technical Specifications

Architecture: Decoder-only Transformer

  • 12 layers, 768 hidden size, 12 attention heads (4 KV heads)
  • SwiGLU FFN (3072 intermediate), RMSNorm, RoPE
  • 32K vocab, 2048 max sequence length
  • Weight tying between embedding and output layers

Implementation: Custom PyTorch implementation from scratch

Software: Python 3.13, PyTorch, NumPy, Tokenizers, tqdm, matplotlib

How to Use

import torch
from ModelArchitecture import Transformer, ModelConfig, generate
from tokenizers import Tokenizer

# Load configuration and model
config = ModelConfig(vocab_size=32000, hidden_size=768, n_heads=12, 
                     n_kv_heads=4, n_kv_groups=3, head_dim=64, n_layers=12,
                     intermediate_size=3072, max_position_embeddings=2048,
                     dropout=0.0, pre_norm=True, tie_weights=True)

model = Transformer(config)
model.load_state_dict(torch.load('model.safetensors'))
model.eval()

# Generate text
tokenizer = Tokenizer.from_file('tokenizer.json')
prompt = "Once upon a time"
input_ids = torch.tensor([tokenizer.encode(prompt).ids])

output = generate(model, input_ids, max_new_tokens=100, 
                 temperature=0.8, top_k=50, top_p=0.9)
print(tokenizer.decode(output[0].tolist()))

Citation

@misc{lumenbase2024,
  author = {Jangra, Hariom},
  title = {LumenBase: A 128M Parameter Language Model Built from Scratch},
  year = {2025},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/HariomJangra/project-lumen}}
}

Contact

Author: Hariom Jangra (@HariomJangra)

For questions or feedback, please open an issue on the GitHub repository.

Downloads last month
52
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using VirtualInsight/Lumen 1