pvduy commited on
Commit
ec219f4
·
verified ·
1 Parent(s): 4d7faa8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -42,6 +42,10 @@ For the Reinforcement Learning (RL) stage, we designed a two-stage training proc
42
 
43
  ## III. Evaluation Results
44
 
 
 
 
 
45
  Our II-Medical-8B-1706 model achieved a 46.8% score on [HealthBench](https://openai.com/index/healthbench/), a comprehensive open-source benchmark evaluating the performance and safety of large language models in healthcare. This performance is comparable to MedGemma-27B from Google. We provide a comparison to models available in ChatGPT below.
46
 
47
  <!-- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61f2636488b9b5abbe184a8e/5r2O4MtzffVYfuUZJe5FO.jpeg) -->
 
42
 
43
  ## III. Evaluation Results
44
 
45
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63466107f7bd6326925fc770/kAyJOqZDuWRYkN3f1YWcS.png)
46
+
47
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63466107f7bd6326925fc770/Sbgmwsefab7uDx5obvy18.png)
48
+
49
  Our II-Medical-8B-1706 model achieved a 46.8% score on [HealthBench](https://openai.com/index/healthbench/), a comprehensive open-source benchmark evaluating the performance and safety of large language models in healthcare. This performance is comparable to MedGemma-27B from Google. We provide a comparison to models available in ChatGPT below.
50
 
51
  <!-- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61f2636488b9b5abbe184a8e/5r2O4MtzffVYfuUZJe5FO.jpeg) -->