Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,10 @@ Based on: [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) by [Qwen](https://hug
|
|
20 |
## Quantization notes
|
21 |
Made with Exllamav2 0.2.9 dev branch with default dataset. You need either to wait for next exllamav2 release or install it from [dev branch](https://github.com/turboderp-org/exllamav2/tree/dev) to use these quants.
|
22 |
It can be used with RTX GPU on Windows or RTX/ROCm card on Linux with TabbyAPI or Text-Generation-WebUI.
|
23 |
-
Ensure you have enough VRAM to run it since it exllamav2 doesn't support RAM offloading.
|
|
|
|
|
|
|
24 |
|
25 |
# Original model card
|
26 |
# JOSIEFIED Model Family
|
|
|
20 |
## Quantization notes
|
21 |
Made with Exllamav2 0.2.9 dev branch with default dataset. You need either to wait for next exllamav2 release or install it from [dev branch](https://github.com/turboderp-org/exllamav2/tree/dev) to use these quants.
|
22 |
It can be used with RTX GPU on Windows or RTX/ROCm card on Linux with TabbyAPI or Text-Generation-WebUI.
|
23 |
+
Ensure you have enough VRAM to run it since it exllamav2 doesn't support RAM offloading.
|
24 |
+
|
25 |
+
When I used it with TabbyAPI + SillyTavern, I had to explicitly uncheck "Add BOS token" token to make the model work properly. Otherwise it looped output.
|
26 |
+
However the model worked perfectly fine with TabbyAPI + OpenWebUI.
|
27 |
|
28 |
# Original model card
|
29 |
# JOSIEFIED Model Family
|