|
--- |
|
license: cc-by-nc-sa-4.0 |
|
datasets: |
|
- dougiefresh/grammar_logic_rhetoric_and_math |
|
- dougiefresh/systems_programming_and_administration |
|
- dougiefresh/systems_programming_code_conversations |
|
- dougiefresh/jade_identity |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/Qwen3-4B |
|
tags: |
|
- grammar |
|
- logic |
|
- rhetoric |
|
- math |
|
- programming |
|
- aarch64 |
|
- c |
|
- rust |
|
- nushell |
|
--- |
|
|
|
# Jade Qwen 3 4B |
|
|
|
A systems progamming Qwen finetune. |
|
|
|
 |
|
|
|
## Model description |
|
|
|
This model was finetuned using synthetic conversations generated using Qwen 3 8B, Qwen 3 4B, and Qwen 3 30B 3A. |
|
|
|
The first set of data with which we generated the conversations was taken from these books. |
|
|
|
 |
|
|
|
The result was a high quality, [Grammar, Logic, Rhetoric and Math dataset](https://huggingface.co/datasets/dougiefresh/grammar_logic_rhetoric_and_math). |
|
|
|
The next set of data was a small collection of documentation hosted as mdBooks. |
|
- [Rust](https://github.com/rust-lang/book) |
|
- [Nushell](https://github.com/nushell/nushell.github.io) |
|
- [Cargo](https://github.com/rust-lang/cargo/tree/master/src/doc/src) |
|
- [Helix](https://github.com/helix-editor/helix) |
|
|
|
The next set of data was a small set of source code repositories. |
|
- [aarch64 Algorithms](https://github.com/TheAlgorithms/AArch64_Assembly) |
|
- [Hyper](https://github.com/hyperium/hyper) |
|
- [Ripgrep](https://github.com/BurntSushi/ripgrep) |
|
- [SQLite](https://github.com/sqlite/sqlite) |
|
|
|
The source code conversations are [available here](https://huggingface.co/datasets/dougiefresh/systems_programming_code_conversations). |
|
|
|
The next set of data was the documentation for each command supported by [tealdeer](https://github.com/tealdeer-rs/tealdeer). |
|
|
|
The last set of data was every manpage currently on my Macbook Air M2. |
|
|
|
The mdBooks and tealdeer supported command docs were combined with data generated from Apple (aarch64) related systems programming books. The synthetic [conversations are available here](https://huggingface.co/datasets/dougiefresh/systems_programming_and_administration). |
|
|
|
The manpages conversations are [available here](https://huggingface.co/datasets/dougiefresh/manpages). |
|
|
|
A mix of CoT and /nothink prompts were used when generating the datasets but the sets skew towards /nothink. |
|
|
|
All of the datasets are one-shot prompts, except for the source code conversations which ask 2 follow up questions after the initial prompt. |
|
|
|
I first merged all of these datasets together in order and finetuned a [knowledge LoRA adapter](https://huggingface.co/dougiefresh/jade_qwen_4b_knowledge_merged_adapter) with Axolotl on a single H200 for 3 epochs and ended up with an eval loss of `0.62` down from `0.9643`. |
|
|
|
I then generated an [identity dataset](https://huggingface.co/dougiefresh/jade_qwen_4b_identity_adapter), using the abilities GPT4-o and Claude Sonnet to employ some degress of wit and sarcasm, and finetuned a [knowledge LoRA adapter](https://huggingface.co/dougiefresh/jade_qwen_4b_knowledge_merged_adapter) with Axolotl on a single H200 for 30 epochs and ended up with an eval loss of `2.3335` down from `7.7014`. |
|
|
|
Finally, I merged these datasets in Qwen 3 4B using the DARE TIES merging method. |
|
|
|
weighting the knowledge dataset `1.5` and the identity dataset `0.5` and setting the density to `0.9`. I tried every combination of weights and density to get the one shot identity questions to trigger witty responses but could not do so with a merged adapter. They however work spledidly when using the LoRA adapter seperately. |
|
|
|
The result is a fast Qwen 3 model that seems to retain the updated knowledge base it was trained on while lacking a lot of the personality I hoped for. I'm currently researching ways to weight the identity data more optimally. I've also noticed the model can get a little manpages obsessed with a focus on Perl (unfortunately) as the bulk of the manpages generated on my system (me, a non-perl tool using developer, oh my god how much of what we do touches perl at some point) are for Perl documentation. |
|
|
|
I've made [8bit](https://huggingface.co/dougiefresh/jade_qwen3_4b_mlx_8bit) and [4bit](https://huggingface.co/dougiefresh/jade_qwen3_4b_mlx_4bit) MLX quantizations available of this bf16 model. |
|
|
|
❤️ |
|
|