--- license: apache-2.0 language: - en pipeline_tag: text-generation tags: - pytorch - transformer - causal-lm - gqa - grouped-query-attention - memory-efficient - llama - qwen - hybrid - v2 - fixed-architecture base_model_revision: main model_type: hybrid_transformer_v2 --- # shivash/hybrid-transformer-276m-v2 🚀 **276M Parameter Hybrid Transformer V2 with GQA-4 Attention** **Version 2 Improvements:** - ✅ Fixed HF transformers compatibility - ✅ Proper decoder-only architecture - ✅ No more "memory" argument errors - ✅ Compatible with HF generation pipeline - ✅ Standard causal language model behavior ## ✨ Key Features - **🧠 GQA-4 Attention**: 75% memory reduction with minimal quality loss - **📊 Parameters**: 276,071,424 parameters (276M) - **🏗️ Architecture**: Fixed decoder-only design for HF compatibility - **📏 Context Length**: 4K tokens (8K effective with RoPE scaling) - **⚡ Efficiency**: Optimized for production deployment - **🔧 HF Compatible**: Works with transformers pipeline and generation ## 🚀 Quick Start ```python from transformers import AutoTokenizer, AutoModelForCausalLM # Load model and tokenizer (V2 - Fixed compatibility) model_name = "shivash/hybrid-transformer-276m-v2" tokenizer = AutoTokenizer.from_pretrained("gpt2") # Use GPT2 tokenizer model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True) # Generate text (now works!) prompt = "The future of artificial intelligence is" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.8) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text) ``` ## 📊 Model Specifications | Specification | Value | |---------------|-------| | Parameters | 276,071,424 | | Architecture | Decoder-only (V2 Fixed) | | Attention Type | GQA-4 (4 groups) | | Layers | 16 | | Hidden Size | 1024 | | Attention Heads | 16 | | Vocabulary Size | 32,000 | | Context Length | 4,096 tokens | | Memory Reduction | 75% vs MHA | ## 🔧 V2 Architecture Fixes - **Decoder-Only**: Properly marked as `is_decoder=True` - **No Encoder**: `is_encoder_decoder=False` - **Causal Masking**: Built-in causal attention masks - **Self-Attention**: No external memory requirements - **HF Compatible**: Works with standard generation methods ## ⚠️ Note This is Version 2 with fixed architecture compatibility. The weights are randomly initialized and ready for training on your target dataset. ## 🆕 What's New in V2 - Fixed "TransformerDecoderLayer.forward() missing memory" error - Compatible with HF transformers generation pipeline - Proper causal language model behavior - Improved integration with HF ecosystem ## 📄 License Apache 2.0 License ## 🤝 Contributing This is V2 of the Hybrid Transformer research project with improved HF compatibility.