Text2Text Generation
GGUF
English
Not-For-All-Audiences
llama-cpp
gguf-my-repo
Inference Endpoints
conversational

Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF

This model was converted to GGUF format from SicariusSicariiStuff/Oni_Mitsubishi_12B using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.


It happened. The long-awaited Gemma-3 is here, and not only are the model sizes really good (1, 4, 12, 27), but the 128k context (except for the 1B 32k) was exactly what the Open-Source community wanted and asked for. My only issue with Gemma models in general, is the VRAM requirement for tuning them, but that's a "me problem." End users will probably be very happy with Gemma-3 in terms of the VRAM requirement for running it.

On the 12th of March, the Gemma-3 family of models was released. So I decided to go full superstitious, and took this omen as a divine calling to finetune the 12B model first. This is how Oni_Mitsubishi_12B was born.

Before starting the actual training run, I used the following command, which I believe has helped the model to converge "better":

for i in {1..666}; do nvidia-smi; done

Gemma is known for its "Gemma knowledge": fandom and \ or other obscure knowledge that sometimes even larger LLMs often do not possess. It gets even better, as this time we also got a vision model embedded into all the Gemma-3 models, except for the 1B. I wonder what are the possibilities for the vision part if the text layers are uncensored?

I have used brand new long context markdown data, some deslopped instruct data (very lightly deslopped, it's very time-consuming to get right), and more than 50% of highly curated and filtered organic human data, meticulously cleaned, and parsed into obedience. A new stack of organic and data-engineered text was used for the first time for Oni_Mitsubishi_12B. I truly hope creating it was worth the effort.

At NO POINT ChatGPT was used for data generation, however, the new Claude 3.7 sonnet was used VERY sparingly for the specific task of creating a small number of humorous datasets (very human-like, was done with a decent amount of prompt engineering), I've meticulously checked them for slop, and it is minimal. This goal of said data was to imitate human text, using the 4chan vernacular.

Speaking of which, I've published a highly curated, SFT-ready 4chan dataset here: UBW_Tapestries, naturally I have included it in the dataset used for this model as well.

    Technical details

I've used the "ancient" Alpaca chat template because the Gemma-3 chat template was behaving funkily, and I didn't want to waste precious time, and instead give the community a more uncensored finetune to play with, as fast as possible (I saw this requested a lot on both Reddit and discord, understandable). In my opinion, it's silly to let perfect be an enemy of the good. Anyway, I had to use both bleeding edge Transformers and Axolotl, and modify stuff that wasn't even supposed to work (like the model's config.json).

Since it's a hybrid model, training its text-only part is a bit problematic, so I hacked a config.json that gaslights the model into thinking it's only a text model, and got some warnings like:

'vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight', 'vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias'}

  • This IS expected if you are initializing Gemma3ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing Gemma3ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Then I saw it trains.

The absolute state when you can train a model before you can actually inference it.


Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF --hf-file oni_mitsubishi_12b-q8_0.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF --hf-file oni_mitsubishi_12b-q8_0.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF --hf-file oni_mitsubishi_12b-q8_0.gguf -p "The meaning to life and the universe is"

or

./llama-server --hf-repo Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF --hf-file oni_mitsubishi_12b-q8_0.gguf -c 2048
Downloads last month
45
GGUF
Model size
11.8B params
Architecture
gemma3

8-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF

Quantized
(11)
this model

Datasets used to train Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF

Collections including Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF