Safetensors
qwen3
TroyDoesAI commited on
Commit
033906b
·
verified ·
1 Parent(s): d191028

IFEval Results - Initial

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -18,6 +18,22 @@ This model was made possible by excellent AMD mi300x compute generously provided
18
 
19
  **⚠️ Note: This is an intermediate checkpoint not intended for direct use. For the complete model, use Qwen3-72B-Embiggened.**
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## Architecture Changes
22
 
23
  ### Original Qwen3-32B
 
18
 
19
  **⚠️ Note: This is an intermediate checkpoint not intended for direct use. For the complete model, use Qwen3-72B-Embiggened.**
20
 
21
+ ## Evaluation Results
22
+
23
+ The model was evaluated on the **IFEval** (Instruction Following Evaluation) benchmark to test its ability to follow explicit instructions. Despite being an intermediate, un-trained model, it shows promising instruction following capabilities.
24
+
25
+ Evaluation was performed using the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) in a 0-shot setting.
26
+
27
+ | Metric | Accuracy |
28
+ | :--- | :---: |
29
+ | **Prompt-level Strict Accuracy** | **68.75%** |
30
+ | **Instruction-level Strict Accuracy** | **75.00%** |
31
+ | Prompt-level Loose Accuracy | 68.75% |
32
+ | Instruction-level Loose Accuracy | 75.00% |
33
+
34
+ - **Strict Accuracy**: All instructions in a prompt must be followed perfectly.
35
+ - **Loose Accuracy**: Allows for minor deviations (e.g., formatting, verbosity) as long as the core instruction is met.
36
+
37
  ## Architecture Changes
38
 
39
  ### Original Qwen3-32B