Update README.md
Browse files
README.md
CHANGED
|
@@ -57,8 +57,66 @@ This project received computational resources and technical support from **Recur
|
|
| 57 |
|
| 58 |
---
|
| 59 |
|
| 60 |
-
## Description
|
| 61 |
|
| 62 |
-
|
| 63 |
|
| 64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
---
|
| 59 |
|
|
|
|
| 60 |
|
| 61 |
+
## Evaluation
|
| 62 |
|
| 63 |
+
Performance evaluation is ongoing. The model shows promising results in:
|
| 64 |
+
- Maintaining base model capabilities while achieving linear attention efficiency
|
| 65 |
+
- Significantly improved needle-in-haystack task performance compared to pure RWKV architectures
|
| 66 |
+
- Competitive performance on standard language modeling benchmarks
|
| 67 |
+
|
| 68 |
+
## Usage with RWKV-Infer
|
| 69 |
+
- **RWKV-Infer** Triton based Hybrid RWKV Inference engine, can be check at: [https://github.com/OpenMOSE/RWKV-Infer/wiki/How-to-Running-RWKV-hxa079-models%3F](https://github.com/OpenMOSE/RWKV-Infer/wiki/How-to-Running-RWKV-hxa079-models%3F)
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
## Usage with Hugging Face Transformers
|
| 73 |
+
|
| 74 |
+
```python
|
| 75 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 76 |
+
|
| 77 |
+
model_name = "OpenMOSE/RWKV-Seed-OSS-36B-hxa079"
|
| 78 |
+
|
| 79 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 80 |
+
model_name,
|
| 81 |
+
torch_dtype="auto",
|
| 82 |
+
device_map="auto",
|
| 83 |
+
trust_remote_code=True,
|
| 84 |
+
)
|
| 85 |
+
|
| 86 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 87 |
+
|
| 88 |
+
prompt = """There is a very famous song that I recall by the singer's surname as Astley.
|
| 89 |
+
I can't remember the name or the youtube URL that people use to link as an example url.
|
| 90 |
+
What's song name?"""
|
| 91 |
+
messages = [
|
| 92 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
| 93 |
+
{"role": "user", "content": prompt},
|
| 94 |
+
]
|
| 95 |
+
text = tokenizer.apply_chat_template(
|
| 96 |
+
messages, tokenize=False, add_generation_prompt=True
|
| 97 |
+
)
|
| 98 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
| 99 |
+
|
| 100 |
+
generated_ids = model.generate(**model_inputs, max_new_tokens=512)
|
| 101 |
+
generated_ids = [
|
| 102 |
+
output_ids[len(input_ids) :]
|
| 103 |
+
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
| 104 |
+
]
|
| 105 |
+
|
| 106 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
| 107 |
+
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
## Code Repositories
|
| 113 |
+
|
| 114 |
+
- **RADLADS Project Code:** The main codebase for the RADLADS paper, including conversion scripts and model code, can be found at: [https://github.com/recursal/RADLADS](https://github.com/recursal/RADLADS)
|
| 115 |
+
- **ARWKV Project Code** The ARWKV original training code, can be found at: [https://github.com/yynil/RWKVInside](https://github.com/yynil/RWKVInside)
|
| 116 |
+
- **Specific Training Code (OpenMOSE):** The training code for this particular model is available at: [https://github.com/OpenMOSE/RWKVInside](https://github.com/OpenMOSE/RWKVInside) (Note: this repository is still under development and may contain bugs.)
|
| 117 |
+
|
| 118 |
+
## Model Card Contact
|
| 119 |
+
|
| 120 |
+
OpenMOSE - 2025
|
| 121 |
+
|
| 122 |
+
|