ToastyPigeon's picture
Update README.md
2038bf2 verified
---
base_model:
- allura-org/Gemma-3-Glitter-12B
- ToastyPigeon/Gemma-3-Confetti-12B
- google/gemma-3-12b-it
- google/gemma-3-12b-pt
library_name: transformers
tags:
- mergekit
- merge
---
# 🌠G3 Starshine 12B🌠
<figure>
<img src="https://huggingface.co/ToastyPigeon/Gemma-3-Starshine-12B/resolve/main/modelcard_image.jpeg" width="600">
</figure>
*This was Merge A / A1 in the testing set.*
A creative writing model based on a merge of fine-tunes on Gemma 3 12B IT and Gemma 3 12B PT.
This is the **Story Focused** merge. This version works better for storytelling and scenarios, as the prose is more novel-like and it has a tendency to impersonate the user character.
See the [Alternate RP Focused](https://huggingface.co/ToastyPigeon/Gemma-3-Starshine-12B-Alt/) version as well.
This is a merge of two G3 models, one trained on instruct and one trained on base:
* [allura-org/Gemma-3-Glitter-12B](https://huggingface.co/allura-org/Gemma-3-Glitter-12B) - Itself a merge of a storywriting and RP train (both also by ToastyPigeon), on instruct
* [ToastyPigeon/Gemma-3-Confetti-12B](https://huggingface.co/ToastyPigeon/Gemma-3-Confetti-12B) - Experimental application of the Glitter data using base instead of instruct, additionally includes some adventure data in the form of SpringDragon.
The result is a lovely blend of Glitter's ability to follow instructions and Confetti's free-spirit prose, effectively 'loosening up' much of the hesitancy that was left in Glitter.
**Update**: Vision tower is back! Have fun.
*Thank you to [jebcarter](https://huggingface.co/jebcarter) for the idea to make this. I love how it turned out!*
## Instruct Format
Uses Gemma2/3 instruct, but has been trained to recognize an optional system role.
*Note: While it won't immediately balk at the system role, results may be better without it.*
```
<start_of_turn>system
{optional system turn with prompt}<end_of_turn>
<start_of_turn>user
{User messages; can also put sysprompt here to use the built-in g3 training}<end_of_turn>
<start_of_turn>model
{model response}<end_of_turn>
```
### Merge Configuration
Yeah, I actually tried several things and surprisingly this one worked best.
```yaml
models:
- model: ToastyPigeon/Gemma-3-Confetti-12B
parameters:
weight: 0.5
- model: allura-org/Gemma-3-Glitter-12B
parameters:
weight: 0.5
merge_method: linear
tokenizer_source: allura-org/Gemma-3-Glitter-12B
```