Can't determine properly which is greater between 9.9 and 9.11

#38

by sniffski - opened 3 days ago

3 days ago

Guys, they say 3.1 beats GPT-4o... But it doesn't seem like that...
Here's the answer to a simple math question:

To determine which is greater between 9.9 and 9.11, you can compare the numbers directly.

- 9.11 has a tenths place of 1 and a hundredths place of 1.
- 9.9 has a tenths place of 9 and no hundredths place (which is equivalent to having a hundredths place of 0).

Since the tenths place in 9.11 is greater than that in 9.9, we can conclude that:

**9.11 is greater than 9.9.**

Also tried with a prompt that makes the model reasoning first, but still no luck...

<details type="reasoning" done="true" duration="25">
<summary>Thought for 25 seconds</summary>
> 1. The user is asking which number is greater between 9.9 and 9.11.
> 2. Break the task into parts:
>    - Compare the two numbers.
> 3. Assign each part to an expert:
>    - Number comparison: Mathematics Expert.
> 4. Combine their insights:
> 
> Mathematics Expert: To compare the numbers, you need to look at each decimal place from left to right.
> 
> - 9 is equal to 9 in both cases.
> - The tenths place (the first digit after the decimal point) is 9 in the first number and 1 in the second number. Since 1 is less than 9, we don't need to look further.
> 
> So, the second number (9.11) is greater than the first (9.9).
</details>
The greater number between 9.9 and 9.11 is **9.11**.

Actually the reasoning is correct, but at the end it gives wrong answer...

urtuuuu

2 days ago

•

edited 2 days ago

That's sad but it's true...

Number Comparison:
The Mathematics Expert will compare the two numbers.

9.9 vs. 9.11:
Both numbers have the same integer part, which is 9.
Comparing the decimal parts: 0.9 (from 9.9) vs. 0.11 (from 9.11).
Since 0.11 is greater than 0.9, 9.11 is greater than 9.9.
Conclusion:
9.11 is greater than 9.9.

sniffski

2 days ago

•

edited 2 days ago

I tried few times and it seems it wrongly converts 0.9 to 0.09... that's why it declares 9.11 is greater
Asking this question fo DeepSeek R1 1.5B and it manages to get the right answer...

Smorty100

about 15 hours ago

this is a tokenizer problem. currently, we tokenize a bunch of numbers in one token.

So that for example 69 would be read as a single token, which improves efficiency but decreases number understanding.

this is a common thing in almost all LLMs, just like letter counting, which is also largely impossible due to the tokenizer.

so this will not improve, unless specifically trained on, reducing the performance in other areas.

sooooo, use code to determine if a number is larger than another, LLMs are overkill.

NePe

about 7 hours ago

@Smorty100 Well actually most of the modern tokenizers encode digits to single tokens, this is true for this model too. It takes like 3 lines of code to test:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('unsloth/Mistral-Small-3.1-24B-Instruct-2503-unsloth-bnb-4bit')
print(tokenizer.encode('9.11'))

urtuuuu

about 4 hours ago

•

edited about 4 hours ago

Guys, i just tested it on this question, but online, on openrouter website, and no matter how many retries, it always answers correctly. Any ideas?
There are several example questions, and their version is "9.11 and 9.9, which one is larger?".

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment