BlinkDL
/

rwkv7-g1

Text Generation

Model card Files Files and versions

rwkv7-g1 / README.md

BlinkDL's picture

Update README.md

90c2c33 verified 2 months ago

|

2.77 kB

	---
	language:
	- en
	- zh
	- fr
	- es
	- de
	- pt
	- ru
	- it
	- ja
	- ko
	- vi
	- ar
	tags:
	- pytorch
	- text-generation
	- causal-lm
	- rwkv
	license: apache-2.0
	datasets:
	- HuggingFaceFW/fineweb-edu
	- mlfoundations/dclm-baseline-1.0
	- cerebras/SlimPajama-627B
	- EleutherAI/pile
	- bigcode/starcoderdata
	- oscar-corpus/OSCAR-2301
	---

	# RWKV7-G1 "GooseOne" pure RNN reasoning model

	These are BASE models (pretrained with web/code/synthetic + instruction/chat/reasoning data), suitable for post-training and fine-tuning (check https://huggingface.co/spaces/Jellyfish042/UncheatableEval to see their performance at language modeling).

	More info & Gradio demo: https://rwkv.com/

	For developers: https://github.com/BlinkDL/RWKV-LM

	RWKV-7 pth => GGUF script: https://github.com/MollySophia/rwkv-mobile/blob/master/converter/convert_rwkv_pth_to_gguf.py

	Use rwkv pip package 0.8.29+ for RWKV-7 inference: https://pypi.org/project/rwkv/

	Efficient inference project: https://github.com/BlinkDL/Albatross

	RWKV APP: https://github.com/RWKV-APP/RWKV_APP (local inference on Android/iOS)

	Please use latest G1a models if available (better at everything).

	```
	Gxx = Data Version

	G0x = less than 1 epoch, as training 1 epoch for a large model is expensive :(
	G0 G0a G0b ... = adding more (newer and better) data, so G0a has better quality (but less) data than G1

	G1x = more than 1 epoch
	G1 G1a G1b ... = adding more (newer and better) data, note G1a has better quality (and more) data than G0a
	```

	Decoding (note: this is for RWKV pip pkg, which apply temp after topp):

	```
	Math: temp 0.3, topp 0.3, alpha_presence 0, alpha_frequency 0, alpha_decay 0.996

	Chat: temp 1, topp 0.3, alpha_presence 0.5, alpha_frequency 0.5, alpha_decay 0.996

	Creative (great for fiction etc.): temp 0.6, topp 0.6 ~ 0.8, alpha_presence 1 ~ 2, alpha_frequency 0.2, alpha_decay 0.99
	```

	There should not be any space at the end of your input (so strip it) or you will upset the tokenizer and see non-English reponse.

	Chat prompt (note: better replace all \n\n in USER_PROMPT to \n as i am using \n\n as "chat round separator" in pretrain data):
	```
	System: YOU_CAN_USE_SYSTEM_IF_NEEDED

	User: PREVIOUS_STUFF

	Assistant: PREVIOUS_STUFF

	User: USER_PROMPT

	Assistant:
	```

	Think prompt:
	```
	User: USER_PROMPT

	Assistant: <think
	```
	---
	Think prompt, alternative style output, valid for 20250922 and newer models. Note there is a space before the "think" after USER_PROMPT:
	```
	User: USER_PROMPT think

	Assistant: <think
	```
	Shorter think (think a bit), same style:
	```
	User: USER_PROMPT think a bit

	Assistant: <think
	```
	Longer think (think a lot), same style:
	```
	User: USER_PROMPT think a lot

	Assistant: <think
	```
	---
	Fake think prompt:
	```
	User: USER_PROMPT

	Assistant: <think>
	</think
	```