Model produces gibberish
Hi, I'm trying to use this model but I can't get it to produce coherent text.
For example, the following code:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "recursal/QRWKV6-7B-Base"
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype=torch.float32,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
eos = tokenizer.convert_tokens_to_ids("<|endoftext|>")
prompt = "Write a short paragraph about the Moon."
# without chat template
x = tokenizer(prompt, return_tensors="pt").to(model.device)
y = model.generate(**x, max_new_tokens=32, do_sample=False, eos_token_id=eos, pad_token_id=eos)
print(tokenizer.decode(y[0][x["input_ids"].shape[1]:], skip_special_tokens=True))
# with chat template
messages = [{"role":"user","content": prompt}]
chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
x = tokenizer(chat, return_tensors="pt").to(model.device)
y = model.generate(**x, max_new_tokens=160, do_sample=False, eos_token_id=eos, pad_token_id=eos)
print(tokenizer.decode(y[0][x["input_ids"].shape[1]:], skip_special_tokens=True))
Produces:
's LI堞FontAwesomeIcon/GPL\views@student灏@student/GPL\views@student chù/GPL怏 John/
known-Identifier strugg约翰WARDS@studentWARDS McC ли@student@student@student@student@student@student
ormsg'iconGetEnumerator nodeSharper\views disappe Wis ли EntityState@student@student Mik@student@student@student@student@student@student@student@student@student@student@student@student@student@student@student@student@student@student
I tried different settings, but all produce gibberish:
- different
torch_dtype
- GPU vs CPU
- with / without chat template
- different decoding strategies
Could you share a code template that produces non-gibberish text? Is there a known issue with this model? (Note: I'm not interested in the Instruct model, but in the Base model.)
Thanks in advance for your help!
Same here. The instruction models seem to work fine but I also could not get the base model to output coherent completions.
Yes I can confirm that recursal/QRWKV7-7B-Instruct
works. However, recursal/QRWKV6-7B-Instruct
(note 6
instead of 7
) also produces gibberish.
There were apparently two issues, related to updates to the FLA and Transformers libraries. I've fixed them in both repos so please let me know if this works for you now!
During further investigation, I uncovered additional issues. Specifically, when evaluating the model with lm_eval
, non-generation tasks run correctly, but generation tasks fail. Moreover, my attempted fix results in 0.0 scores with outputs that are just repetitive loops.
Non-generation tasks work fine
For example, piqa
runs without issues and produces reasonable scores:
lm_eval --model hf \
--model_args pretrained=recursal/QRWKV6-7B-Base \
--trust_remote_code \
--tasks piqa \
--device cuda:0 \
--batch_size 8 \
--limit 5
Output:
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
piqa | 1 | none | 0 | acc | ↑ | 0.8 | ± | 0.2000 |
none | 0 | acc_norm | ↑ | 0.6 | ± | 0.2449 |
Generation tasks fail
Running a generation task like gsm8k
(only change to lm_eval
command above is to have --tasks gsm8k
instead of --task piqa
) triggers an error:
File "/nfs-gpu/users_home/davidstap/.cache_hf/modules/transformers_modules/recursal/QRWKV6-7B-Base/62932c0601c8bf45f6249f75f495baac87909981/modeling_rwkv6qwen2.py", line 446, in forward
attn_output = attn_output * g
~~~~~~~~~~~~^~~
RuntimeError: The size of tensor a (17920) must match the size of tensor b (3584) at non-singleton dimension 2
I suspected this was related to incorrect handling of left padding masks, so I implemented a small patch. However, with the patch applied, the model scores 0.0 and generates repetitive nonsense.
Example of faulty generation
Prompt (from gsm8k):
Question: Samantha bought a crate of 30 eggs for $5. If she decides to sell each egg for 20 cents, how many eggs will she have left by the time she recovers her capital from the sales?
Answer: There are 100 cents in each $1 so $5 gives 5*100 cents = <<5*100=500>>500 cents
To recover her capital of 500 cents from a selling price of 20 cents per egg she has to sell 500/20 = <<500/20=25>>25 eggs
There were 30 eggs in the crate to start with so she will have 30-25 = <<30-25=5>>5 eggs left
#### 5
Question: The teacher agrees to order pizza for the class. For every student in the class, she will buy 2 pieces of cheese and 1 piece of onion and they will eat exactly that amount. A large pizza has 18 slices. She orders 6 total pizzas and there are 8 pieces of cheese and 4 pieces of onion leftover. How many students are in the class?
Answer: Cheese pizzas are 2/3 of the pizza's purchased because 2 / (2+1) = 2/3
She buys 4 cheese pizzas because 6 x (2/3) = <<6*2/3=4>>4
These give her 72 pieces of cheese pizza because 4 x 18 = <<4*18=72>>72
The students at 64 pieces of cheese because 72 - 8 = <<72-8=64>>64
There are 32 students in her class because 64 / 2 = <<64/2=32>>32
#### 32
Question: Sandra wants to buy some sweets. She saved $10 for this purpose. Her mother gave her an additional $4, and her father twice as much as her mother. One candy costs $0.5, and one jelly bean $0.2. She wants to buy 14 candies and 20 jelly beans. How much money will she be left with after the purchase?
Answer: Sandra's father gave her $4 * 2 = $<<4*2=8>>8.
So Sandra has in total $8 + $4 + $10 = $<<8+4+10=22>>22.
She wants 14 candies, so she is going to pay 14 candies * $0.50/candy = $<<14*0.5=7>>7 for them.
She wants also 20 jellybeans, and they're going to cost 20 jellybeans * $0.20/jellybean = $<<20*0.2=4>>4.
So after the purchase, she will be left with $22 - $4 - $7 = $<<22-4-7=11>>11.
#### 11
Question: Faith's neighborhood, with a total of 20 homes, decided to install solar panels. Each home needed 10 panels capable of providing their power needs. The supplier of the panels brought 50 panels less than the required amount. The neighbors agreed to only install the panels up to where they'd be finished. How many homes had their panels installed?
Answer: The total number of panels required is 20*10 = <<20*10=200>>200 panels.
When 50 failed to be delivered, the total number available for use became 200-50 = <<200-50=150>>150 panels.
If each home requires 10 panels, the number of homes that had panels installed is 150/10 = <<150/10=15>>15 homes
#### 15
Question: Jenna wants to buy a concert ticket that costs $181, plus five drink tickets for $7 each. If Jenna earns $18 an hour and works 30 hours a week, what percentage of her monthly salary will she spend on this outing?
Answer: First find the total cost of the drink tickets: 5 tickets * $7/ticket = $<<5*7=35>>35
Then add that cost to the cost of the ticket to find the total cost: $35 + $181 = $<<35+181=216>>216
Then multiply Jenna's hourly rate by the number of hours she works each week to find her weekly earnings: $18/hour * 30 hours/week = $<<18*30=540>>540/week
Then multiply her weekly earnings by the number of weeks she works each month: $540/week * 4 weeks/month = $<<540*4=2160>>2160/month
Then divide the cost of the concert by Jenna's monthly earnings and multiply by 100% to express the answer as a percentage: $216 / $2160 * 100% = 10%
#### 10
Question: Billy sells DVDs. He has 8 customers on Tuesday. His first 3 customers buy one DVD each. His next 2 customers buy 2 DVDs each. His last 3 customers don't buy any DVDs. How many DVDs did Billy sell on Tuesday?
Answer:
Generated output:
Billy and the same time to the same time to the same time to the same time ...
(repeated endlessly)
This clearly indicates something is fundamentally broken in the generation pipeline.
Confirming the issue outside lm_eval
This behavior is not tied to lm_eval
. Running the same gsm8k prompt with transformers
’ generate()
produces the same looping output (truncated here to 256 tokens):
model_name = "recursal/QRWKV6-7B-Base"
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype=torch.float32,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
eos = tokenizer.convert_tokens_to_ids("<|endoftext|>")
x = tokenizer(prompt, return_tensors="pt").to(model.device)
y = model.generate(**x, max_new_tokens=256, do_sample=False, eos_token_id=eos, pad_token_id=eos)
print(tokenizer.decode(y[0][x["input_ids"].shape[1]:], skip_special_tokens=True))
Output:
Billy's the number of the number of the number of the number ...
(same issue as above)
Comparison with QRWKV7
For comparison, the QRWKV7-7B-Instruct
model does not suffer from this issue. Running gsm8k
works correctly:
lm_eval --model hf \
--model_args pretrained=recursal/QRWKV7-7B-Instruct \
--trust_remote_code \
--tasks gsm8k \
--device cuda:0 \
--batch_size 8 \
--limit 5
Results:
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 1.0 | ± | 0.0 |
strict-match | 5 | exact_match | ↑ | 0.8 | ± | 0.2 |
Conclusion
Something is still fundamentally wrong with QRWKV6 models when running generation tasks. Non-generation evaluations are fine, but text generation degenerates into meaningless repetition, both inside and outside lm_eval
. By contrast, QRWKV7 models work correctly.
Thanks for noticing that - there were indeed a few problems, mostly related to Transformers but one was an argument order switcheroo for the cache from that last fix. Please let me know if this works for you.