Update README.md

0f51424 verified 29 days ago

4.27 kB

	---
	license: apache-2.0
	base_model:
	- ByteDance-Seed/Seed-OSS-36B-Instruct
	---

	# RWKV-Seed-OSS-36B-hxa079

	Acknowledgment

	This project received computational resources and technical support from Recursal.AI. I'm deeply grateful for their support!

	This is an experimental model that converts most of the Transformer LLM to RWKV linear attention based on the RADLADS method.

	---

	## Model Overview

	* Model Name: RWKV-Seed-OSS-36B-hxa079
	* Architecture: RWKV “hxa079+” hybrid — RWKV-Attention strategically interleaved with NoPE FullAttention
	* Base Model: ByteDance-Seed/Seed-OSS-36B-Instruct
	* Model Revision: alpha
	* Parameters: ~37.1B
	* Context Window (Passkey): 130k

	---

	## Architecture Details

	* RWKV Layers: Interleaved RWKV blocks based on the `hxa079` design
	* Transformer Layers: Placed at strategic depths to enhance long-context performance
	* Hybrid Design:

	* RWKV provides temporal decay and efficient recurrent-style state handling
	* NoPE (No Positional Embedding) FullAttention augments global reasoning without redundant positional encoding
	* LoRA Customization:

	* Rank Decay: 448
	* ICLR: 192
	* Value Residual Mix: 128
	* Key Residual Mix: 128
	* Gate: 576
	* RoPE Usage: Enabled (`use_rope: true`), aligning positional encoding with RWKV blocks

	---

	## Key Hyperparameters

	* Hidden Size: 5120
	* Intermediate Size: 27,648
	* Head Dimension: 128
	* Attention Heads: 80
	* Key/Value Heads: 8
	* Hidden Layers: 64
	* Max Position Embeddings: 524,288
	* Activation: SiLU
	* Dropout: 0.1 (residual & attention)
	* Bias: Disabled for MLP & Attention Output

	---


	## Evaluation

	Performance evaluation is ongoing. The model shows promising results in:
	- Maintaining base model capabilities while achieving linear attention efficiency
	- Significantly improved needle-in-haystack task performance compared to pure RWKV architectures
	- Competitive performance on standard language modeling benchmarks
	- mmlu: 78.39%(Base 82.41%)
	- gsm8k: 86.88%(Base93.93%) with gentoken=2048
	- passkey 130k+(Base 500k)

	## Usage with RWKV-Infer
	- RWKV-Infer Triton based Hybrid RWKV Inference engine, can be check at: [https://github.com/OpenMOSE/RWKV-Infer/wiki/How-to-Running-RWKV-hxa079-models%3F](https://github.com/OpenMOSE/RWKV-Infer/wiki/How-to-Running-RWKV-hxa079-models%3F)


	## Usage with Hugging Face Transformers

	need install flash-linear-attention
	```bash
	pip install flash-linear-attention
	```

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "OpenMOSE/RWKV-Seed-OSS-36B-hxa079"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto",
	trust_remote_code=True,
	)

	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = """There is a very famous song that I recall by the singer's surname as Astley.
	I can't remember the name or the youtube URL that people use to link as an example url.
	What's song name?"""
	messages = [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": prompt},
	]
	text = tokenizer.apply_chat_template(
	messages, tokenize=False, add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(**model_inputs, max_new_tokens=512)
	generated_ids = [
	output_ids[len(input_ids) :]
	for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

	```



	## Code Repositories

	- RADLADS Project Code: The main codebase for the RADLADS paper, including conversion scripts and model code, can be found at: [https://github.com/recursal/RADLADS](https://github.com/recursal/RADLADS)
	- ARWKV Project Code The ARWKV original training code, can be found at: [https://github.com/yynil/RWKVInside](https://github.com/yynil/RWKVInside)
	- Specific Training Code (OpenMOSE): The training code for this particular model is available at: [https://github.com/OpenMOSE/RWKVInside](https://github.com/OpenMOSE/RWKVInside) (Note: this repository is still under development and may contain bugs.)

	## Model Card Contact

	OpenMOSE - 2025