Text Generation
GGUF
English
Not-For-All-Audiences
llama-cpp
gguf-my-repo
text2text-generation
conversational
File size: 5,624 Bytes
cabd852
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47bfa4e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cabd852
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
base_model: SicariusSicariiStuff/Oni_Mitsubishi_12B
datasets:
- SicariusSicariiStuff/UBW_Tapestries
- SicariusSicariiStuff/Synth_Usernames
language:
- en
license: gemma
pipeline_tag: text2text-generation
tags:
- not-for-all-audiences
- llama-cpp
- gguf-my-repo
---

# Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF
This model was converted to GGUF format from [`SicariusSicariiStuff/Oni_Mitsubishi_12B`](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
Refer to the [original model card](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B) for more details on the model.

---
It happened. The long-awaited Gemma-3 is here, and not only are the model sizes really good (1, 4, 12, 27), but the 128k
 context (except for the 1B 32k) was exactly what the Open-Source 
community wanted and asked for. My only issue with Gemma models in 
general, is the VRAM requirement for tuning them, but that's a "me problem." End users will probably be very happy with Gemma-3 in terms of the VRAM requirement for running it.


On the 12th of March, the Gemma-3 family of models was released. So I decided to go full superstitious, and took this omen as a divine calling to finetune the 12B model first. This is how Oni_Mitsubishi_12B was born.


Before starting the actual training run, I used the following 
command, which I believe has helped the model to converge "better":


for i in {1..666}; do nvidia-smi; done



Gemma is known for its "Gemma knowledge": fandom and
 \ or other obscure knowledge that sometimes even larger LLMs often do 
not possess. It gets even better, as this time we also got a vision model
 embedded into all the Gemma-3 models, except for the 1B. I wonder what 
are the possibilities for the vision part if the text layers are 
uncensored?


I have used brand new long context markdown data, some deslopped instruct data (very lightly deslopped, it's very time-consuming to get right), and more than 50%
 of highly curated and filtered organic human data, meticulously 
cleaned, and parsed into obedience. A new stack of organic and 
data-engineered text was used for the first time for Oni_Mitsubishi_12B. I truly hope creating it was worth the effort.


At NO POINT ChatGPT was used for data generation, however, the new Claude 3.7 sonnet was used VERY sparingly for the specific task
 of creating a small number of humorous datasets (very human-like, was 
done with a decent amount of prompt engineering), I've meticulously 
checked them for slop, and it is minimal. This goal of said data was to imitate human text, using the 4chan vernacular.


Speaking of which, I've published a highly curated, SFT-ready 4chan dataset here: UBW_Tapestries, naturally I have included it in the dataset used for this model as well.




	
		
	

		Technical details
	



I've used the "ancient" Alpaca chat template because the Gemma-3 chat template
 was behaving funkily, and I didn't want to waste precious time, and 
instead give the community a more uncensored finetune to play with, as 
fast as possible (I saw this requested a lot on both Reddit and discord,
 understandable). In my opinion, it's silly to let perfect be an enemy 
of the good. Anyway, I had to use both bleeding edge Transformers and Axolotl, and modify stuff that wasn't even supposed to work (like the model's config.json).


Since it's a hybrid model, training its text-only part is a bit 
problematic, so I hacked a config.json that gaslights the model into 
thinking it's only a text model, and got some warnings like:


'vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight', 'vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias'}
- This IS expected if you are initializing Gemma3ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Gemma3ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).



Then I saw it trains.




The absolute state when you can train a model before you can actually inference it.

---
## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)

```bash
brew install llama.cpp

```
Invoke the llama.cpp server or the CLI.

### CLI:
```bash
llama-cli --hf-repo Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF --hf-file oni_mitsubishi_12b-q8_0.gguf -p "The meaning to life and the universe is"
```

### Server:
```bash
llama-server --hf-repo Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF --hf-file oni_mitsubishi_12b-q8_0.gguf -c 2048
```

Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.
```
git clone https://github.com/ggerganov/llama.cpp
```

Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
```
cd llama.cpp && LLAMA_CURL=1 make
```

Step 3: Run inference through the main binary.
```
./llama-cli --hf-repo Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF --hf-file oni_mitsubishi_12b-q8_0.gguf -p "The meaning to life and the universe is"
```
or 
```
./llama-server --hf-repo Triangle104/Oni_Mitsubishi_12B-Q8_0-GGUF --hf-file oni_mitsubishi_12b-q8_0.gguf -c 2048
```