Model performance

#1
by drwlf - opened

Hey buddy! Are you liking the model’s performance? Anything you’d like improved?

@drwlf Thanks a lot for creating this awesome model! It is extremely impressive for its size. I extensively tested it with over 100 personal/private medical questions and some general Q&A questions about topics I'm personally interested in. I did so in a multi-shot fashion using the i1-Q6_K quant in latest llama.cpp server and using the "Dirty D" Dolphin system prompt.

The model managed to answer questions correctly none of all 72 models I tested before where able to answer. This includes massive models like Nature-Reason-1-AGI which is a 405B model and many other much larger models. This is in fact the smallest model I tested using my private medicaly focused RealWorld Q&A benchmark and despite this managed to compete with the very best.

The model seems to be fully uncensored making the it extremely useful for medical questions as most other models would refuses to answer them. I highly appreciate the decision to make it never refuse to answer no matter what the user asks.

I love the way reasoning was built into this model. Instead of using think tags like r1-based models this model integrates the reasoning into the response making it easy for the user to follow the models thought process which helps the user to judge the models thought process and double check that the model did not make any mistakes. I personally prefer this over separating reasoning from the response.

While the model is amazing here 2 things that could in my opinion be improved

  • The model seams to never stop by itself once the response is done and instead continues repeating itself until reaching the token limit. Based on the generated text their seems to have been no attempt made by the model to generate any end of a message token to terminate generation. I'm not sure if this is an issue of our GGUF quants and llama.cpp or an issue inherited from the original model.
  • The model in puts in my opinion too many disclaimers into the response. It does so even if the system message tells the model to omit any disclaimers. There was almost no response that didn't contain a disclaimer. If this is intended depends on the target user. If non-medical educated users ask the model medical questions, then a disclaimer to consult a medical professional might be justified but if the target users are medical professionals this quickly gets annoying. I recommend make it so disclaimers can be disabled using the system prompt so whoever controls the system prompt can decide if the response should contain disclaimers or not. If half the response is a disclaimer, then this wastes valuable time and resources generating and reading these disclaimers.

I just realized that this model comes with vision capabilities. I completely missed that as in the model card you wrote "medical language model" so my initial tests did not include vision but I will soon review its vision capabilities as well.

Hmmm I also have just the LORA adapters for it, I think the quants might be an issue. Maybe you you try merging the Lora adapters with a base model la mlabonne’s anliterated or unsloth

I have the same issue of the model not generating an end of a message token using the original model using Transformers but only for Instruct and Chat-Instruct mode while Chat mode works fine. So maybe a chat template issue? Are you sure you finetuned your model using the correct chat template and the chat template you included matches the one you want your users to use? The instruction template gets autodetected based of the chat template that comes which the model while the template text generation web UI uses for chat mode is hardcoded.

These were my thoughts, but the mixture of datasets I used was formatted properly. Do you have discord or some other social where I could contact you?

These were my thoughts, but the mixture of datasets I used was formatted properly. Do you have discord or some other social where I could contact you?

Yes sure. You can contact me on Discord under nicobosshard.

Sign up or log in to comment