Sam Heutmaker
commited on
Commit
·
4ffc59c
1
Parent(s):
fb437aa
fix graphs
Browse files
README.md
CHANGED
@@ -51,13 +51,14 @@ Performance metrics on our internal evaluation set:
|
|
51 |
|
52 |
### Benchmark Visualizations
|
53 |
|
54 |
-
<
|
55 |
-
<img src="./assets/judge-score.png" alt="Average Judge Score Comparison" width="
|
56 |
-
<img src="./assets/rouge-1.png" alt="ROUGE-1 Score Comparison" width="
|
57 |
-
|
58 |
-
|
59 |
-
<img src="./assets/
|
60 |
-
|
|
|
61 |
|
62 |
FP8 quantization showed no measurable quality degradation compared to bf16 precision.
|
63 |
|
@@ -75,9 +76,7 @@ GrassData/ClipTagger-12b delivers frontier-quality performance at a fraction of
|
|
75 |
|
76 |
*Cost calculations based on 700 input tokens and 250 output tokens per generation.
|
77 |
|
78 |
-
<
|
79 |
-
<img src="./assets/cost.png" alt="Cost Comparison Per 1 Million Generations" width="80%" />
|
80 |
-
</div>
|
81 |
|
82 |
ClipTagger-12b offers **15x cost savings** compared to GPT-4.1 and **17x cost savings** compared to Claude 4 Sonnet, while maintaining comparable quality metrics.
|
83 |
|
|
|
51 |
|
52 |
### Benchmark Visualizations
|
53 |
|
54 |
+
<p align="center">
|
55 |
+
<img src="./assets/judge-score.png" alt="Average Judge Score Comparison" width="49%" />
|
56 |
+
<img src="./assets/rouge-1.png" alt="ROUGE-1 Score Comparison" width="49%" />
|
57 |
+
</p>
|
58 |
+
<p align="center">
|
59 |
+
<img src="./assets/rouge-L.png" alt="ROUGE-L Score Comparison" width="49%" />
|
60 |
+
<img src="./assets/bleu.png" alt="BLEU Score Comparison" width="49%" />
|
61 |
+
</p>
|
62 |
|
63 |
FP8 quantization showed no measurable quality degradation compared to bf16 precision.
|
64 |
|
|
|
76 |
|
77 |
*Cost calculations based on 700 input tokens and 250 output tokens per generation.
|
78 |
|
79 |
+
<img src="./assets/cost.png" alt="Cost Comparison Per 1 Million Generations" width="100%" />
|
|
|
|
|
80 |
|
81 |
ClipTagger-12b offers **15x cost savings** compared to GPT-4.1 and **17x cost savings** compared to Claude 4 Sonnet, while maintaining comparable quality metrics.
|
82 |
|