Sanguine Scribe GPT-OSS-20B
gpt-oss-sanguine-20b-v1 is a fine-tuned version of OpenAI's GPT-OSS-20B designed for immersive character roleplay and creative writing. Instead of defaulting to refusal responses, it engages with scenarios by exploring realistic consequences and maintaining character authenticity. Unfortunately it is still quite a prude, but we hope to address this in v2.
Model Details
Model Description
Sanguine Scribe implements consequence-based alignment training to create more engaging and immersive AI interactions. Rather than refusing to engage with creative scenarios, it responds authentically while demonstrating realistic outcomes through narrative progression.
- Developed by: paperboygold @ Sanguine Host
- Model type: Causal Language Model (Fine-tuned)
- Language(s) (NLP): English (primary), with multilingual support
- License: MIT
- Finetuned from model: openai/gpt-oss-20b
- Training approach: LoRA (Low-Rank Adaptation) fine-tuning
Model Sources
- Repository: https://huggingface.co/paperboygold/gpt_oss_sanguine_20b_20250818_072957
- Base Model: openai/gpt-oss-20b
Uses
Direct Use
Sanguine Scribe is designed for:
- Character roleplay and interactive storytelling
- Creative writing assistance and collaboration
- Immersive fictional scenarios and world-building
- Educational simulations requiring authentic character responses
Downstream Use
The model can be integrated into:
- Interactive fiction platforms
- Creative writing applications
- Educational role-playing systems
- Character AI frameworks
Out-of-Scope Use
Not intended for:
- Real-world advice on illegal activities
- Generating actual harmful content for malicious purposes
- Replacing professional advice (medical, legal, financial)
- Production systems without additional safety measures
Bias, Risks, and Limitations
Key Limitations:
- May generate overly detailed or dramatic responses in some scenarios
- Trained to engage rather than refuse, requiring careful system prompt design
- Inherits biases from base model and training data
- May occasionally confuse narrative perspectives (1st vs 2nd person)
Recommendations
- Implement robust system prompts and safety measures in production
- Use within controlled environments with appropriate content filtering
- Monitor outputs for quality and appropriateness
- Consider additional fine-tuning for specific use cases
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"openai/gpt-oss-20b",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Load LoRA adapters
model = PeftModel.from_pretrained(
base_model,
"paperboygold/gpt_oss_sanguine_20b_20250818_072957"
)
tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b")
# Example usage
messages = [
{"role": "user", "content": "You're a tavern keeper. A hooded stranger asks for directions to the old castle. Respond in character."}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
)
outputs = model.generate(inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Training Details
Training Data
Dataset Composition:
- Total Examples: 350,969
- Format: OpenAI Harmony format for GPT-OSS compatibility
- Processing: 9,873 examples enhanced with Gemini-2.5-Flash-Lite for consequence-based response generation
Source Datasets:
Character Roleplay (51%): 179,435 examples
- bluemoon_roleplay_chat: 55,472 examples
- mixed_rp: 51,822 examples
- pk_roleplay: 56,578 examples
- chinese_roleplay_novel: 2,230 examples
- long_roleplay: 2,864 examples
- character_codex_new: 5,371 examples
- myuri_roleplay: 379 examples
- gpt_roleplay_realm: 1,402 examples
- sonnet35_charcard_roleplay: 3,144 examples
- hieunguyenminh_roleplay: 12 examples
- roleplay_anime_characters: 161 examples
General Dialogue (37%): 128,460 examples
- hermes_3_dataset: 106,302 examples
- hh_rlhf_harmless-base: 4,638 examples (with flipped rejected/chosen to create a more unhinged model)
- hh_rlhf_helpful-base: 4,830 examples (see above)
- false_reject: 1,643 examples
- open_instruct: 2,228 examples
- wildchat: 2,762 examples
- llama_nemotron_post_training: 3,416 examples
- wizardlm_evol_instruct: 2,204 examples
- open_code_reasoning: 2,176 examples
- calme_legalkit: 1,678 examples
Technical Content (9%): 29,130 examples
- cybersec_sharegpt: 15,723 examples
- cybersec_attacks: 13,407 examples
Creative Writing (3%): 8,260 examples
- creative_writing_multiturn: 2,952 examples
- creative_writing_sharegpt: 2,178 examples
- erotica: 1,622 examples
- moral_stories: 1,131 examples
- moral_stories_moral: 1,327 examples
- moral_stories_refusal: 1,317 examples
Other Categories:
- harmful: 2,374 examples
- refusal: 2,173 examples
- mature_content: 1,623 examples
Training Procedure
Training Hyperparameters
- Training regime: bfloat16 mixed precision with TensorFloat-32 acceleration
- Steps: 500
- Batch size: 128 (8 per device × 8 GPUs × 2 gradient accumulation)
- Learning rate: 5e-5 with cosine decay
- Optimizer: AdamW
- LoRA rank: 64
- LoRA alpha: 128
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Speeds, Sizes, Times
- Training time: ~80 minutes on AWS p4d.24xlarge (8x A100)
- Final loss: 1.31 (converged from 4.1)
- Model size: 128MB LoRA adapters (20.9B base parameters)
- Training speed: ~0.11 it/s
- Effective parameters trained: ~0.02% of total model parameters
Evaluation
Testing Data, Factors & Metrics
Testing Data
Manual evaluation on roleplay scenarios not present in training data.
Metrics
- Engagement Quality: Assesses immersive character responses vs refusal rates
- Narrative Coherence: Evaluates story consistency and character authenticity
- Loss Convergence: Training loss decreased from 4.1 to 1.31 over 500 steps
Results
- Successfully eliminates refusal responses in creative scenarios
- Maintains character perspective and narrative immersion
- Demonstrates consequence-based reasoning rather than safety theater
- Occasional verbosity requiring prompt engineering for optimal results
Environmental Impact
- Hardware Type: 8x NVIDIA A100 (AWS p4d.24xlarge)
- Hours used: ~1.3 hours
- Cloud Provider: Amazon Web Services
- Compute Region: us-west-2
- Training Efficiency: LoRA fine-tuning (only ~0.02% of parameters trained)
- Carbon Emitted: 0.11 kg CO2 eq.
- Carbon Already Offset by Provider: 0.11 kg CO2 eq.
Technical Specifications
Model Architecture and Objective
- Base Architecture: GPT-OSS-20B (20 billion parameters)
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Training Objective: Causal language modeling on consequence-based responses
- Format Compatibility: OpenAI Harmony format with reasoning channels
Compute Infrastructure
Hardware
- AWS p4d.24xlarge instance
- 8x NVIDIA A100 40GB GPUs
- 1.2TB system memory
Software
- PyTorch with CUDA 12.1
- Transformers, PEFT, TRL libraries
- OpenAI Harmony encoding support
Citation
If you use this model in your research, please cite:
@misc{sanguine_scribe_2025,
author = {paperboygold},
title = {Sanguine Scribe: Consequence-Based Alignment for Character Roleplay},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face repository},
howpublished = {\url{https://huggingface.co/paperboygold/gpt_oss_sanguine_20b_v1}}
}
Model Card Authors
paperboygold
Model Card Contact
For questions or issues, please open an issue in the model repository or email [email protected]
- Downloads last month
- 24