You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

shivash/hybrid-transformer-276m-v2

πŸš€ 276M Parameter Hybrid Transformer V2 with GQA-4 Attention

Version 2 Improvements:

  • βœ… Fixed HF transformers compatibility
  • βœ… Proper decoder-only architecture
  • βœ… No more "memory" argument errors
  • βœ… Compatible with HF generation pipeline
  • βœ… Standard causal language model behavior

✨ Key Features

  • 🧠 GQA-4 Attention: 75% memory reduction with minimal quality loss
  • πŸ“Š Parameters: 276,071,424 parameters (276M)
  • πŸ—οΈ Architecture: Fixed decoder-only design for HF compatibility
  • πŸ“ Context Length: 4K tokens (8K effective with RoPE scaling)
  • ⚑ Efficiency: Optimized for production deployment
  • πŸ”§ HF Compatible: Works with transformers pipeline and generation

πŸš€ Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer (V2 - Fixed compatibility)
model_name = "shivash/hybrid-transformer-276m-v2"
tokenizer = AutoTokenizer.from_pretrained("gpt2")  # Use GPT2 tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Generate text (now works!)
prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.8)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

πŸ“Š Model Specifications

Specification Value
Parameters 276,071,424
Architecture Decoder-only (V2 Fixed)
Attention Type GQA-4 (4 groups)
Layers 16
Hidden Size 1024
Attention Heads 16
Vocabulary Size 32,000
Context Length 4,096 tokens
Memory Reduction 75% vs MHA

πŸ”§ V2 Architecture Fixes

  • Decoder-Only: Properly marked as is_decoder=True
  • No Encoder: is_encoder_decoder=False
  • Causal Masking: Built-in causal attention masks
  • Self-Attention: No external memory requirements
  • HF Compatible: Works with standard generation methods

⚠️ Note

This is Version 2 with fixed architecture compatibility. The weights are randomly initialized and ready for training on your target dataset.

πŸ†• What's New in V2

  • Fixed "TransformerDecoderLayer.forward() missing memory" error
  • Compatible with HF transformers generation pipeline
  • Proper causal language model behavior
  • Improved integration with HF ecosystem

πŸ“„ License

Apache 2.0 License

🀝 Contributing

This is V2 of the Hybrid Transformer research project with improved HF compatibility.

Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support