ChatML as always. Full-precision this time. Quants will come later.
An experiment to make causal models more conversational. Many of them can already chat, but suffer from problems like occasional dullness, incoherence and verbatim repetition. This model DOES NOT follow assistant-style instructions and IS NOT INTENDED TO.
Findings
The default ChatML template causes the model to sometimes identify as an AI assistant, which we'd consider undesirable. This is probably due to the assistant/user/system markers. Future iterations will likely use our own format.
We fine tuned the Qwen2.5-7B causal model as a 32-rank LoRA on the Roleplay-Logs-Sharegpt-Ngram-Cleaned dataset by NewEden, specifically the first 500 rows for 3 epochs. Despite the name, it includes non-RP conversation as well.
This dataset probably isn't the best, it has a few problems:
- It's not the most coherent thing ever.
- It contains some examples of "brainrot" and nonsensical phrase repetition, which we don't see as bad, but it seems to confuse the model a bit.
- It's still partially synthetic, based on character.ai logs, so it's bound to contain some cliched phrases from their model, which is not ideal. The goal is NOT TO replicate the character.ai model, but to build a unique conversational model that is fun to interact with.
However:
- It also contains a lot of interesting conversational patterns which corporate instruct models would never spit out.
- After training, the model is usable and very fun to interact with. It still feels a bit undercooked, so we plan to address that.
We plan to keep this dataset in future iterations (however in moderation). We plan to include dialogue scraped from Reddit, and the Discord-Data dataset, and probably some other things if we consider them interesting. The next iteration will include this data.
We do not plan to include instructions or synthetic data from models like GPT-4 or Claude, as those have been fine-tuned for agreeability and professional tone. Moreso, when attempting to prompt the model to write more casually, they tend to stick too hard to the guidelines provided (when a lot of them are included), or write in a stilted, cheesy and unnatural way (when the instructions provided are vague).
However, we do plan to experiment with instruction following in the future ๐
- Downloads last month
- 10