Please Provide Chat Format and Temp Settings

#2
by MysteriousPlane - opened

This is a fantastic model, thanks for doing this!

Unfortunately, after tinkering with it for hours and hours literally, I'm unable to make it stop its generation. For the first couple of messages it might work, but then it just keeps on generating until it hits the 2048 token limit.

Please, if you can, let me know the proper chat format, and temp settings. I'm using LLama 3 Instruct in Instruct Template and Context Template in SIllyTavern but it's not working. The model I've downloaded is the Q4_K_M quant through KoboldCPP with Q8 KV cache. Temp is 0.7, Top K 40. I'm splitting 25 layers to my VRAM (16GB) and the rest to my system RAM (64GB).

Tarek's Graveyard org

Hi thanks!

Unfortunately this was done before I realized I could add the chat template into my config file. It is setup as a text completion model which might explain the issues you are getting. To get it to work for chat would mean playing with the configs which you wouldn't be able to manipulate on the GGUF.

A work around may be to add 'assistant' to your stop sequence. But that is all I can think of. Unfortunatley I only use text completion not chat at all so I am not 100% sure.

As for the settings. I use (with KoboldCPP and silly tavern, not sure if chat uses different samplers):

HELP.png

Thanks a lot for your detailed reply! I will use your suggestions, the screenshot helps a ton! I am using it as a text completion model haha. I do have Llam@ception 1.5, so I will download the version you're using to see if it's better. I will report back soon. :D

Here's my settings, at temp 1 it's talking kind of flowery and medieval-like, and at temp 0.7 it's really getting defensive while calling me defensive and yapping a lot hahah. Please let me know if I'm doing this right. Llam@ception 1.5 seems to be working, it's not generating nonstop until the token limit.

2025-04-20 (20.25.23)  47127.jpg
2025-04-20 (20.25.32)  47128.jpg

Tarek's Graveyard org

Awesome! Personally the only samplers I ever mess around with are Temp (between 0.7 - 1.1) and Min P (between 0.01 - 0.05) depending on the model of course. The rest are usual safe to leave as is. As for the meme Samplers like smoothing and XTC, I only use DRY as you have it there.

As for what they do, lowering temp makes the models replies more deterministic, while raising it makes the model more creative when its choosing its next token. Min P on the other hand gets more creative the lower you go.

Bonus: If you like this model you should try some of my mainline models, I'll link some below:
Tarek07/Legion-V2.1-LLaMa-70B
Tarek07/Dungeonmaster-V2.2-Expanded-LLaMa-70B
Tarek07/Dungeonmaster-V2.4-Expanded-LLaMa-70B

Thanks for reaching out and sharing! Have fun!

Thanks for getting back again! Makes sense, I'll play with Min P more now! I'll check your other models out as well, thanks for listing them. I actually came to your model because it's currently ranking at the UGI leaderboard (https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard) as the highest "Willingness to answer" (9.5) model with over 50% of total score. Anyways, feel free to close this thread if you want, as this is practically resolved right. The model's able to stop when it wants to. Thanks again for making this model!

Tarek07 changed discussion status to closed

Sign up or log in to comment