ToastyPigeon
/

Gemma-3-Starshine-12B

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

Gemma-3-Starshine-12B / README.md

ToastyPigeon's picture

Update README.md

2038bf2 verified 4 months ago

|

history blame contribute delete

2.44 kB

	---
	base_model:
	- allura-org/Gemma-3-Glitter-12B
	- ToastyPigeon/Gemma-3-Confetti-12B
	- google/gemma-3-12b-it
	- google/gemma-3-12b-pt
	library_name: transformers
	tags:
	- mergekit
	- merge
	---
	# 🌠G3 Starshine 12B🌠
	<figure>
	<img src="https://huggingface.co/ToastyPigeon/Gemma-3-Starshine-12B/resolve/main/modelcard_image.jpeg" width="600">
	</figure>

	This was Merge A / A1 in the testing set.

	A creative writing model based on a merge of fine-tunes on Gemma 3 12B IT and Gemma 3 12B PT.

	This is the Story Focused merge. This version works better for storytelling and scenarios, as the prose is more novel-like and it has a tendency to impersonate the user character.

	See the [Alternate RP Focused](https://huggingface.co/ToastyPigeon/Gemma-3-Starshine-12B-Alt/) version as well.

	This is a merge of two G3 models, one trained on instruct and one trained on base:
	* [allura-org/Gemma-3-Glitter-12B](https://huggingface.co/allura-org/Gemma-3-Glitter-12B) - Itself a merge of a storywriting and RP train (both also by ToastyPigeon), on instruct
	* [ToastyPigeon/Gemma-3-Confetti-12B](https://huggingface.co/ToastyPigeon/Gemma-3-Confetti-12B) - Experimental application of the Glitter data using base instead of instruct, additionally includes some adventure data in the form of SpringDragon.

	The result is a lovely blend of Glitter's ability to follow instructions and Confetti's free-spirit prose, effectively 'loosening up' much of the hesitancy that was left in Glitter.

	Update: Vision tower is back! Have fun.

	Thank you to [jebcarter](https://huggingface.co/jebcarter) for the idea to make this. I love how it turned out!

	## Instruct Format

	Uses Gemma2/3 instruct, but has been trained to recognize an optional system role.

	Note: While it won't immediately balk at the system role, results may be better without it.

	```
	<start_of_turn>system
	{optional system turn with prompt}<end_of_turn>
	<start_of_turn>user
	{User messages; can also put sysprompt here to use the built-in g3 training}<end_of_turn>
	<start_of_turn>model
	{model response}<end_of_turn>
	```

	### Merge Configuration

	Yeah, I actually tried several things and surprisingly this one worked best.

	```yaml
	models:
	- model: ToastyPigeon/Gemma-3-Confetti-12B
	parameters:
	weight: 0.5
	- model: allura-org/Gemma-3-Glitter-12B
	parameters:
	weight: 0.5
	merge_method: linear
	tokenizer_source: allura-org/Gemma-3-Glitter-12B
	```