imi2 commited on
Commit
0702cd8
·
verified ·
1 Parent(s): 5c9f016

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - gguf
4
+ ---
5
+ # TG Benchmarks on OnePlus 13
6
+
7
+ There is discrepancy between qualcomm's SOTA 18 t/s llama2 (3.5GB) speed and cpu versions: https://aihub.qualcomm.com/models/llama_v2_7b_chat_quantized
8
+
9
+ TODO:
10
+
11
+ - benchmark qnn llama2 locally
12
+ - benchmark T-MAC groupsize 128 if needed
13
+ - test opencl and available qnn pull requests, feasibility for speculative decoding alongside cpu inference
14
+
15
+
16
+ ## Model Benchmarks
17
+
18
+ ### Llama 2
19
+ | Quantization | Benchmark 1 (200) | Benchmark 2 (50) |
20
+ |------------------|-------------------|------------------|
21
+ | Q4_0 (Pure) | 12.76 | 13.22 |
22
+ | Q4_0 (Normal) | 12.54 | 13.03 |
23
+
24
+ **Test Command:**
25
+ ```bash
26
+ -p hi -t 6 -s 42 -c 512 -n (200,50) -m llama2
27
+ ```
28
+
29
+ ## Reka-Flash 21B Benchmarks Q4_0 (Normal)
30
+
31
+ | Test Configuration | Tokens | Result |
32
+ |--------------------|--------|--------|
33
+ | Benchmark 1 | 200 | 4.46 |
34
+ | Benchmark 2 | 50 | 4.45 |
35
+
36
+ ------------
37
+ ## Intermediate Layer Sizes
38
+ | Model Architecture | Intermediate Size |
39
+ |--------------------------|-------------------|
40
+ | Llama2 7B | 11,008 |
41
+ | Llama3 3B | 8,192 |
42
+ | Llama3 8B | 14,336 |
43
+ | Qwen 7B 2.5 | 18,944 |
44
+ | Qwen 2.5B/14B | 13,824 |
45
+ | QWQ | 27,648 |
46
+ | Reka-Flash 21B | 19,648 |
47
+ | Mistral 2503 | 32,768 |
48
+ | Codestral 22B | 16,384 |