|
--- |
|
base_model: SicariusSicariiStuff/Oni_Mitsubishi_12B |
|
datasets: |
|
- SicariusSicariiStuff/UBW_Tapestries |
|
- SicariusSicariiStuff/Synth_Usernames |
|
language: |
|
- en |
|
license: gemma |
|
pipeline_tag: text2text-generation |
|
tags: |
|
- not-for-all-audiences |
|
- llama-cpp |
|
- gguf-my-repo |
|
--- |
|
|
|
# Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF |
|
This model was converted to GGUF format from [`SicariusSicariiStuff/Oni_Mitsubishi_12B`](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B) for more details on the model. |
|
|
|
--- |
|
It happened. The long-awaited Gemma-3 is here, and not only are the model sizes really good (1, 4, 12, 27), but the 128k |
|
context (except for the 1B 32k) was exactly what the Open-Source |
|
community wanted and asked for. My only issue with Gemma models in |
|
general, is the VRAM requirement for tuning them, but that's a "me problem." End users will probably be very happy with Gemma-3 in terms of the VRAM requirement for running it. |
|
|
|
|
|
On the 12th of March, the Gemma-3 family of models was released. So I decided to go full superstitious, and took this omen as a divine calling to finetune the 12B model first. This is how Oni_Mitsubishi_12B was born. |
|
|
|
|
|
Before starting the actual training run, I used the following |
|
command, which I believe has helped the model to converge "better": |
|
|
|
|
|
for i in {1..666}; do nvidia-smi; done |
|
|
|
|
|
|
|
Gemma is known for its "Gemma knowledge": fandom and |
|
\ or other obscure knowledge that sometimes even larger LLMs often do |
|
not possess. It gets even better, as this time we also got a vision model |
|
embedded into all the Gemma-3 models, except for the 1B. I wonder what |
|
are the possibilities for the vision part if the text layers are |
|
uncensored? |
|
|
|
|
|
I have used brand new long context markdown data, some deslopped instruct data (very lightly deslopped, it's very time-consuming to get right), and more than 50% |
|
of highly curated and filtered organic human data, meticulously |
|
cleaned, and parsed into obedience. A new stack of organic and |
|
data-engineered text was used for the first time for Oni_Mitsubishi_12B. I truly hope creating it was worth the effort. |
|
|
|
|
|
At NO POINT ChatGPT was used for data generation, however, the new Claude 3.7 sonnet was used VERY sparingly for the specific task |
|
of creating a small number of humorous datasets (very human-like, was |
|
done with a decent amount of prompt engineering), I've meticulously |
|
checked them for slop, and it is minimal. This goal of said data was to imitate human text, using the 4chan vernacular. |
|
|
|
|
|
Speaking of which, I've published a highly curated, SFT-ready 4chan dataset here: UBW_Tapestries, naturally I have included it in the dataset used for this model as well. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Technical details |
|
|
|
|
|
|
|
|
|
I've used the "ancient" Alpaca chat template because the Gemma-3 chat template |
|
was behaving funkily, and I didn't want to waste precious time, and |
|
instead give the community a more uncensored finetune to play with, as |
|
fast as possible (I saw this requested a lot on both Reddit and discord, |
|
understandable). In my opinion, it's silly to let perfect be an enemy |
|
of the good. Anyway, I had to use both bleeding edge Transformers and Axolotl, and modify stuff that wasn't even supposed to work (like the model's config.json). |
|
|
|
|
|
Since it's a hybrid model, training its text-only part is a bit |
|
problematic, so I hacked a config.json that gaslights the model into |
|
thinking it's only a text model, and got some warnings like: |
|
|
|
|
|
'vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight', 'vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias'} |
|
- This IS expected if you are initializing Gemma3ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). |
|
- This IS NOT expected if you are initializing Gemma3ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). |
|
|
|
|
|
|
|
Then I saw it trains. |
|
|
|
|
|
|
|
|
|
The absolute state when you can train a model before you can actually inference it. |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF --hf-file oni_mitsubishi_12b-q4_k_s.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF --hf-file oni_mitsubishi_12b-q4_k_s.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF --hf-file oni_mitsubishi_12b-q4_k_s.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF --hf-file oni_mitsubishi_12b-q4_k_s.gguf -c 2048 |
|
``` |
|
|