---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- pytorch
- transformer
- causal-lm
- gqa
- grouped-query-attention
- memory-efficient
- llama
- qwen
- hybrid
- v2
- fixed-architecture
base_model_revision: main
model_type: hybrid_transformer_v2
---

# shivash/hybrid-transformer-276m-v2

🚀 **276M Parameter Hybrid Transformer V2 with GQA-4 Attention**

**Version 2 Improvements:**
- ✅ Fixed HF transformers compatibility
- ✅ Proper decoder-only architecture
- ✅ No more "memory" argument errors
- ✅ Compatible with HF generation pipeline
- ✅ Standard causal language model behavior

## ✨ Key Features

- **🧠 GQA-4 Attention**: 75% memory reduction with minimal quality loss
- **📊 Parameters**: 276,071,424 parameters (276M)
- **🏗️ Architecture**: Fixed decoder-only design for HF compatibility
- **📏 Context Length**: 4K tokens (8K effective with RoPE scaling)
- **⚡ Efficiency**: Optimized for production deployment
- **🔧 HF Compatible**: Works with transformers pipeline and generation

## 🚀 Quick Start

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer (V2 - Fixed compatibility)
model_name = "shivash/hybrid-transformer-276m-v2"
tokenizer = AutoTokenizer.from_pretrained("gpt2")  # Use GPT2 tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Generate text (now works!)
prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.8)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```

## 📊 Model Specifications

| Specification | Value |
|---------------|-------|
| Parameters | 276,071,424 |
| Architecture | Decoder-only (V2 Fixed) |
| Attention Type | GQA-4 (4 groups) |
| Layers | 16 |
| Hidden Size | 1024 |
| Attention Heads | 16 |
| Vocabulary Size | 32,000 |
| Context Length | 4,096 tokens |
| Memory Reduction | 75% vs MHA |

## 🔧 V2 Architecture Fixes

- **Decoder-Only**: Properly marked as `is_decoder=True`
- **No Encoder**: `is_encoder_decoder=False`
- **Causal Masking**: Built-in causal attention masks
- **Self-Attention**: No external memory requirements
- **HF Compatible**: Works with standard generation methods

## ⚠️ Note

This is Version 2 with fixed architecture compatibility. The weights are randomly initialized and ready for training on your target dataset.

## 🆕 What's New in V2

- Fixed "TransformerDecoderLayer.forward() missing memory" error
- Compatible with HF transformers generation pipeline
- Proper causal language model behavior
- Improved integration with HF ecosystem

## 📄 License

Apache 2.0 License

## 🤝 Contributing

This is V2 of the Hybrid Transformer research project with improved HF compatibility.