v000000
/

Llama-3-Instruct-15B-SPPO-Iter3-SH

Text Generation

text-generation-inference

Model card Files Files and versions Community

Llama-3-Instruct-15B-SPPO-Iter3-SH / README.md

v000000's picture

Update README.md

9015acc verified 12 months ago

|

history blame contribute delete

2.69 kB

	---
	base_model:
	- UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3
	- ZeusLabs/L3-Aethora-15B-V2
	library_name: transformers
	tags:
	- mergekit
	- merge
	- llama
	---

	Semi-Healed Llama-3 15B. Programming, Scientific Q&A, General Instruct

	---------------------------------------------------------------------

	# Llama-3-Instruct-15B-SPPO-Iter3-SH-F32

	Fully functional upscaled version of [Llama-3-Instruct-8B-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3) to 15B parameters with projection swap.

	Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)

	---------------------------------------------------------------------

	# Quants
	* [GGUF Q5_K_M](https://huggingface.co/v000000/Llama-3-Instruct-15B-SPPO-Iter3-SH-Q5_K_M-GGUF)

	## merge

	This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

	## Merge Details
	### Merge Method

	This model was merged using the Passthrough and SLERP merge method.

	### Models Merged

	The following models were included in the merge:
	* [UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3)
	* [ZeusLabs/L3-Aethora-15B-V2](https://huggingface.co/ZeusLabs/L3-Aethora-15B-V2)
	* [grimjim/Llama-3-Instruct-abliteration-LoRA-8B](https://huggingface.co/grimjim/Llama-3-Instruct-abliteration-LoRA-8B)

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml
	#1.

	dtype: float32
	merge_method: passthrough
	slices:
	- sources:
	- layer_range: [0, 24]
	model: UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3+grimjim/Llama-3-Instruct-abliteration-LoRA-8B
	- sources:
	- layer_range: [8, 24]
	model: UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3+grimjim/Llama-3-Instruct-abliteration-LoRA-8B
	parameters:
	- sources:
	- layer_range: [8, 24]
	model: UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3+grimjim/Llama-3-Instruct-abliteration-LoRA-8B
	- sources:
	- layer_range: [24, 32]
	model: UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3+grimjim/Llama-3-Instruct-abliteration-LoRA-8B

	#2.

	models:
	- model: ./Llama-3-Instruct-15B-SPPO-Iter3
	merge_method: slerp
	base_model: ZeusLabs/L3-Aethora-15B-V2
	parameters:
	t:
	- filter: o_proj
	value: 0 #take finetuned from Aethora
	- filter: down_proj
	value: 0 #take finetuned from Aethora
	- value: 1 #rest of tensors SPPO
	dtype: float32

	```

	uncensored=no

	# Prompt Template(Llama-3-Instruct)
	```bash
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	{system_prompt}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{input}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	{output}<\|eot_id\|>

	```