Not-For-All-Audiences

llama-cpp

gguf-my-repo

text2text-generation

conversational

Model card Files Files and versions Community

Oni_Mitsubishi_12B-Q4_K_S-GGUF / README.md

Triangle104

Update README.md

bbf5f4b verified 4 months ago

preview code

raw

history blame contribute delete

5.64 kB

	---
	base_model: SicariusSicariiStuff/Oni_Mitsubishi_12B
	datasets:
	- SicariusSicariiStuff/UBW_Tapestries
	- SicariusSicariiStuff/Synth_Usernames
	language:
	- en
	license: gemma
	pipeline_tag: text2text-generation
	tags:
	- not-for-all-audiences
	- llama-cpp
	- gguf-my-repo
	---

	# Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF
	This model was converted to GGUF format from [`SicariusSicariiStuff/Oni_Mitsubishi_12B`](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
	Refer to the [original model card](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B) for more details on the model.

	---
	It happened. The long-awaited Gemma-3 is here, and not only are the model sizes really good (1, 4, 12, 27), but the 128k
	context (except for the 1B 32k) was exactly what the Open-Source
	community wanted and asked for. My only issue with Gemma models in
	general, is the VRAM requirement for tuning them, but that's a "me problem." End users will probably be very happy with Gemma-3 in terms of the VRAM requirement for running it.


	On the 12th of March, the Gemma-3 family of models was released. So I decided to go full superstitious, and took this omen as a divine calling to finetune the 12B model first. This is how Oni_Mitsubishi_12B was born.


	Before starting the actual training run, I used the following
	command, which I believe has helped the model to converge "better":


	for i in {1..666}; do nvidia-smi; done



	Gemma is known for its "Gemma knowledge": fandom and
	\ or other obscure knowledge that sometimes even larger LLMs often do
	not possess. It gets even better, as this time we also got a vision model
	embedded into all the Gemma-3 models, except for the 1B. I wonder what
	are the possibilities for the vision part if the text layers are
	uncensored?


	I have used brand new long context markdown data, some deslopped instruct data (very lightly deslopped, it's very time-consuming to get right), and more than 50%
	of highly curated and filtered organic human data, meticulously
	cleaned, and parsed into obedience. A new stack of organic and
	data-engineered text was used for the first time for Oni_Mitsubishi_12B. I truly hope creating it was worth the effort.


	At NO POINT ChatGPT was used for data generation, however, the new Claude 3.7 sonnet was used VERY sparingly for the specific task
	of creating a small number of humorous datasets (very human-like, was
	done with a decent amount of prompt engineering), I've meticulously
	checked them for slop, and it is minimal. This goal of said data was to imitate human text, using the 4chan vernacular.


	Speaking of which, I've published a highly curated, SFT-ready 4chan dataset here: UBW_Tapestries, naturally I have included it in the dataset used for this model as well.








	Technical details




	I've used the "ancient" Alpaca chat template because the Gemma-3 chat template
	was behaving funkily, and I didn't want to waste precious time, and
	instead give the community a more uncensored finetune to play with, as
	fast as possible (I saw this requested a lot on both Reddit and discord,
	understandable). In my opinion, it's silly to let perfect be an enemy
	of the good. Anyway, I had to use both bleeding edge Transformers and Axolotl, and modify stuff that wasn't even supposed to work (like the model's config.json).


	Since it's a hybrid model, training its text-only part is a bit
	problematic, so I hacked a config.json that gaslights the model into
	thinking it's only a text model, and got some warnings like:


	'vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight', 'vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias'}
	- This IS expected if you are initializing Gemma3ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
	- This IS NOT expected if you are initializing Gemma3ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).



	Then I saw it trains.




	The absolute state when you can train a model before you can actually inference it.

	---
	## Use with llama.cpp
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp

	```
	Invoke the llama.cpp server or the CLI.

	### CLI:
	```bash
	llama-cli --hf-repo Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF --hf-file oni_mitsubishi_12b-q4_k_s.gguf -p "The meaning to life and the universe is"
	```

	### Server:
	```bash
	llama-server --hf-repo Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF --hf-file oni_mitsubishi_12b-q4_k_s.gguf -c 2048
	```

	Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

	Step 1: Clone llama.cpp from GitHub.
	```
	git clone https://github.com/ggerganov/llama.cpp
	```

	Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
	```
	cd llama.cpp && LLAMA_CURL=1 make
	```

	Step 3: Run inference through the main binary.
	```
	./llama-cli --hf-repo Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF --hf-file oni_mitsubishi_12b-q4_k_s.gguf -p "The meaning to life and the universe is"
	```
	or
	```
	./llama-server --hf-repo Triangle104/Oni_Mitsubishi_12B-Q4_K_S-GGUF --hf-file oni_mitsubishi_12b-q4_k_s.gguf -c 2048
	```