Text Generation
GGUF
English
Not-For-All-Audiences
llama-cpp
gguf-my-repo
text2text-generation
conversational
Triangle104's picture
Update README.md
bbf5f4b verified
---
base_model: SicariusSicariiStuff/Oni_Mitsubishi_12B
datasets:
- SicariusSicariiStuff/UBW_Tapestries
- SicariusSicariiStuff/Synth_Usernames
language:
- en
license: gemma
pipeline_tag: text2text-generation
tags:
- not-for-all-audiences
- llama-cpp
- gguf-my-repo
---
# Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF
This model was converted to GGUF format from [`SicariusSicariiStuff/Oni_Mitsubishi_12B`](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
Refer to the [original model card](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B) for more details on the model.
---
It happened. The long-awaited Gemma-3 is here, and not only are the model sizes really good (1, 4, 12, 27), but the 128k
context (except for the 1B 32k) was exactly what the Open-Source
community wanted and asked for. My only issue with Gemma models in
general, is the VRAM requirement for tuning them, but that's a "me problem." End users will probably be very happy with Gemma-3 in terms of the VRAM requirement for running it.
On the 12th of March, the Gemma-3 family of models was released. So I decided to go full superstitious, and took this omen as a divine calling to finetune the 12B model first. This is how Oni_Mitsubishi_12B was born.
Before starting the actual training run, I used the following
command, which I believe has helped the model to converge "better":
for i in {1..666}; do nvidia-smi; done
Gemma is known for its "Gemma knowledge": fandom and
\ or other obscure knowledge that sometimes even larger LLMs often do
not possess. It gets even better, as this time we also got a vision model
embedded into all the Gemma-3 models, except for the 1B. I wonder what
are the possibilities for the vision part if the text layers are
uncensored?
I have used brand new long context markdown data, some deslopped instruct data (very lightly deslopped, it's very time-consuming to get right), and more than 50%
of highly curated and filtered organic human data, meticulously
cleaned, and parsed into obedience. A new stack of organic and
data-engineered text was used for the first time for Oni_Mitsubishi_12B. I truly hope creating it was worth the effort.
At NO POINT ChatGPT was used for data generation, however, the new Claude 3.7 sonnet was used VERY sparingly for the specific task
of creating a small number of humorous datasets (very human-like, was
done with a decent amount of prompt engineering), I've meticulously
checked them for slop, and it is minimal. This goal of said data was to imitate human text, using the 4chan vernacular.
Speaking of which, I've published a highly curated, SFT-ready 4chan dataset here: UBW_Tapestries, naturally I have included it in the dataset used for this model as well.
Technical details
I've used the "ancient" Alpaca chat template because the Gemma-3 chat template
was behaving funkily, and I didn't want to waste precious time, and
instead give the community a more uncensored finetune to play with, as
fast as possible (I saw this requested a lot on both Reddit and discord,
understandable). In my opinion, it's silly to let perfect be an enemy
of the good. Anyway, I had to use both bleeding edge Transformers and Axolotl, and modify stuff that wasn't even supposed to work (like the model's config.json).
Since it's a hybrid model, training its text-only part is a bit
problematic, so I hacked a config.json that gaslights the model into
thinking it's only a text model, and got some warnings like:
'vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight', 'vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias'}
- This IS expected if you are initializing Gemma3ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Gemma3ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Then I saw it trains.
The absolute state when you can train a model before you can actually inference it.
---
## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)
```bash
brew install llama.cpp
```
Invoke the llama.cpp server or the CLI.
### CLI:
```bash
llama-cli --hf-repo Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF --hf-file oni_mitsubishi_12b-q4_k_s.gguf -p "The meaning to life and the universe is"
```
### Server:
```bash
llama-server --hf-repo Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF --hf-file oni_mitsubishi_12b-q4_k_s.gguf -c 2048
```
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
Step 1: Clone llama.cpp from GitHub.
```
git clone https://github.com/ggerganov/llama.cpp
```
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
```
cd llama.cpp && LLAMA_CURL=1 make
```
Step 3: Run inference through the main binary.
```
./llama-cli --hf-repo Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF --hf-file oni_mitsubishi_12b-q4_k_s.gguf -p "The meaning to life and the universe is"
```
or
```
./llama-server --hf-repo Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF --hf-file oni_mitsubishi_12b-q4_k_s.gguf -c 2048
```