RWKV-Seed-OSS-36B-hxa079
Acknowledgment
This project received computational resources and technical support from Recursal.AI. I'm deeply grateful for their support!
This is an experimental model that converts most of the Transformer LLM to RWKV linear attention based on the RADLADS method.
Model Overview
- Model Name: RWKV-Seed-OSS-36B-hxa079
- Architecture: RWKV “hxa079+” hybrid — RWKV-Attention strategically interleaved with NoPE FullAttention
- Base Model: ByteDance-Seed/Seed-OSS-36B-Instruct
- Model Revision: alpha
- Parameters: ~37.1B
- Context Window (Passkey): 130k
Architecture Details
RWKV Layers: Interleaved RWKV blocks based on the
hxa079
designTransformer Layers: Placed at strategic depths to enhance long-context performance
Hybrid Design:
- RWKV provides temporal decay and efficient recurrent-style state handling
- NoPE (No Positional Embedding) FullAttention augments global reasoning without redundant positional encoding
LoRA Customization:
- Rank Decay: 448
- ICLR: 192
- Value Residual Mix: 128
- Key Residual Mix: 128
- Gate: 576
RoPE Usage: Enabled (
use_rope: true
), aligning positional encoding with RWKV blocks
Key Hyperparameters
- Hidden Size: 5120
- Intermediate Size: 27,648
- Head Dimension: 128
- Attention Heads: 80
- Key/Value Heads: 8
- Hidden Layers: 64
- Max Position Embeddings: 524,288
- Activation: SiLU
- Dropout: 0.1 (residual & attention)
- Bias: Disabled for MLP & Attention Output
Evaluation
Performance evaluation is ongoing. The model shows promising results in:
- Maintaining base model capabilities while achieving linear attention efficiency
- Significantly improved needle-in-haystack task performance compared to pure RWKV architectures
- Competitive performance on standard language modeling benchmarks
- mmlu: 78.39%(Base 82.41%)
- gsm8k: 86.88%(Base93.93%) with gentoken=2048
- passkey 130k+(Base 500k)
Usage with RWKV-Infer
- RWKV-Infer Triton based Hybrid RWKV Inference engine, can be check at: https://github.com/OpenMOSE/RWKV-Infer/wiki/How-to-Running-RWKV-hxa079-models%3F
Usage with Hugging Face Transformers
need install flash-linear-attention
pip install flash-linear-attention
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "OpenMOSE/RWKV-Seed-OSS-36B-hxa079"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = """There is a very famous song that I recall by the singer's surname as Astley.
I can't remember the name or the youtube URL that people use to link as an example url.
What's song name?"""
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [
output_ids[len(input_ids) :]
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Code Repositories
- RADLADS Project Code: The main codebase for the RADLADS paper, including conversion scripts and model code, can be found at: https://github.com/recursal/RADLADS
- ARWKV Project Code The ARWKV original training code, can be found at: https://github.com/yynil/RWKVInside
- Specific Training Code (OpenMOSE): The training code for this particular model is available at: https://github.com/OpenMOSE/RWKVInside (Note: this repository is still under development and may contain bugs.)
Model Card Contact
OpenMOSE - 2025
- Downloads last month
- 36
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for OpenMOSE/RWKV-Seed-OSS-36B-hxa079
Base model
ByteDance-Seed/Seed-OSS-36B-Instruct