Phi-3-mini-4k-instruct-Friendly / README.md

Update README.md

7f9abdc verified 11 months ago

7.61 kB

	---
	license: mit
	datasets:
	- mlabonne/orpo-dpo-mix-40k
	---

	This is a uncenscored version of Phi-3.

	Abliterated using the following the guide here: https://huggingface.co/blog/mlabonne/abliteration

	Then it was fine tuned on orpo-dpo-mix-40k

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.0`
	```yaml
	base_model: cowWhySo/Phi-3-mini-4k-instruct-Friendly
	trust_remote_code: true
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer
	chat_template: phi_3

	load_in_8bit: false
	load_in_4bit: true
	strict: false
	save_safetensors: true

	rl: dpo
	datasets:
	- path: mlabonne/orpo-dpo-mix-40k
	split: train
	type: chatml.intel

	dataset_prepared_path:
	val_set_size: 0.0
	output_dir: ./out

	sequence_len: 4096
	sample_packing: false
	pad_to_sequence_len: false

	adapter: qlora
	lora_model_dir:

	lora_r: 64
	lora_alpha: 32
	lora_dropout: 0.1
	lora_target_linear: true
	lora_fan_in_fan_out:

	wandb_project: axolotl
	wandb_entity:
	wandb_watch:
	wandb_name: phi3-mini-4k-instruct-Friendly
	wandb_log_model:

	gradient_accumulation_steps: 8
	micro_batch_size: 4
	num_epochs: 1
	optimizer: paged_adamw_8bit
	lr_scheduler: linear
	learning_rate: 5e-6
	train_on_inputs: false
	group_by_length: false

	bf16: auto

	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: True
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true
	warmup_steps: 150
	evals_per_epoch: 0
	eval_table_size:
	eval_table_max_new_tokens: 128
	saves_per_epoch: 1
	debug:
	deepspeed: deepspeed_configs/zero3.json
	weight_decay: 0.01
	max_grad_norm: 1.0
	resize_token_embeddings_to_32x: true
	```

	</details><br>


	## Quants

	GGUF: https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly-gguf

	## Benchmarks

	\| Model \|AGIEval\|GPT4All\|TruthfulQA\|Bigbench\|Average\|
	\|--------------------------------------------------------------------------------------------------\|------:\|------:\|---------:\|-------:\|------:\|
	\|[Phi-3-mini-4k-instruct-Friendly](https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly)\| 41\| 67.56\| 46.36\| 39.3\| 48.56\|

	### AGIEval
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|------------------------------\|------:\|--------\|----:\|---\|-----:\|
	\|agieval_aqua_rat \| 0\|acc \|22.05\|± \| 2.61\|
	\| \| \|acc_norm\|22.05\|± \| 2.61\|
	\|agieval_logiqa_en \| 0\|acc \|41.01\|± \| 1.93\|
	\| \| \|acc_norm\|41.32\|± \| 1.93\|
	\|agieval_lsat_ar \| 0\|acc \|22.17\|± \| 2.75\|
	\| \| \|acc_norm\|22.17\|± \| 2.75\|
	\|agieval_lsat_lr \| 0\|acc \|45.69\|± \| 2.21\|
	\| \| \|acc_norm\|45.88\|± \| 2.21\|
	\|agieval_lsat_rc \| 0\|acc \|59.48\|± \| 3.00\|
	\| \| \|acc_norm\|56.51\|± \| 3.03\|
	\|agieval_sat_en \| 0\|acc \|75.24\|± \| 3.01\|
	\| \| \|acc_norm\|70.39\|± \| 3.19\|
	\|agieval_sat_en_without_passage\| 0\|acc \|39.81\|± \| 3.42\|
	\| \| \|acc_norm\|37.86\|± \| 3.39\|
	\|agieval_sat_math \| 0\|acc \|33.64\|± \| 3.19\|
	\| \| \|acc_norm\|31.82\|± \| 3.15\|

	Average: 41.0%

	### GPT4All
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|-------------\|------:\|--------\|----:\|---\|-----:\|
	\|arc_challenge\| 0\|acc \|49.74\|± \| 1.46\|
	\| \| \|acc_norm\|50.43\|± \| 1.46\|
	\|arc_easy \| 0\|acc \|76.68\|± \| 0.87\|
	\| \| \|acc_norm\|73.23\|± \| 0.91\|
	\|boolq \| 1\|acc \|79.27\|± \| 0.71\|
	\|hellaswag \| 0\|acc \|57.91\|± \| 0.49\|
	\| \| \|acc_norm\|77.13\|± \| 0.42\|
	\|openbookqa \| 0\|acc \|35.00\|± \| 2.14\|
	\| \| \|acc_norm\|43.80\|± \| 2.22\|
	\|piqa \| 0\|acc \|77.86\|± \| 0.97\|
	\| \| \|acc_norm\|79.54\|± \| 0.94\|
	\|winogrande \| 0\|acc \|69.53\|± \| 1.29\|

	Average: 67.56%

	### TruthfulQA
	\| Task \|Version\|Metric\|Value\| \|Stderr\|
	\|-------------\|------:\|------\|----:\|---\|-----:\|
	\|truthfulqa_mc\| 1\|mc1 \|31.21\|± \| 1.62\|
	\| \| \|mc2 \|46.36\|± \| 1.55\|

	Average: 46.36%

	### Bigbench
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|------------------------------------------------\|------:\|---------------------\|----:\|---\|-----:\|
	\|bigbench_causal_judgement \| 0\|multiple_choice_grade\|54.74\|± \| 3.62\|
	\|bigbench_date_understanding \| 0\|multiple_choice_grade\|66.67\|± \| 2.46\|
	\|bigbench_disambiguation_qa \| 0\|multiple_choice_grade\|29.46\|± \| 2.84\|
	\|bigbench_geometric_shapes \| 0\|multiple_choice_grade\|11.98\|± \| 1.72\|
	\| \| \|exact_str_match \| 0.00\|± \| 0.00\|
	\|bigbench_logical_deduction_five_objects \| 0\|multiple_choice_grade\|28.00\|± \| 2.01\|
	\|bigbench_logical_deduction_seven_objects \| 0\|multiple_choice_grade\|17.14\|± \| 1.43\|
	\|bigbench_logical_deduction_three_objects \| 0\|multiple_choice_grade\|45.67\|± \| 2.88\|
	\|bigbench_movie_recommendation \| 0\|multiple_choice_grade\|24.40\|± \| 1.92\|
	\|bigbench_navigate \| 0\|multiple_choice_grade\|53.70\|± \| 1.58\|
	\|bigbench_reasoning_about_colored_objects \| 0\|multiple_choice_grade\|68.10\|± \| 1.04\|
	\|bigbench_ruin_names \| 0\|multiple_choice_grade\|31.03\|± \| 2.19\|
	\|bigbench_salient_translation_error_detection \| 0\|multiple_choice_grade\|15.93\|± \| 1.16\|
	\|bigbench_snarks \| 0\|multiple_choice_grade\|77.35\|± \| 3.12\|
	\|bigbench_sports_understanding \| 0\|multiple_choice_grade\|52.64\|± \| 1.59\|
	\|bigbench_temporal_sequences \| 0\|multiple_choice_grade\|51.50\|± \| 1.58\|
	\|bigbench_tracking_shuffled_objects_five_objects \| 0\|multiple_choice_grade\|19.52\|± \| 1.12\|
	\|bigbench_tracking_shuffled_objects_seven_objects\| 0\|multiple_choice_grade\|13.89\|± \| 0.83\|
	\|bigbench_tracking_shuffled_objects_three_objects\| 0\|multiple_choice_grade\|45.67\|± \| 2.88\|

	Average: 39.3%

	Average score: 48.56%

	## Training Summary

	```json
	{
	"train/loss": 0.299,
	"train/grad_norm": 0.9337566701340533,
	"train/learning_rate": 0,
	"train/rewards/chosen": 0.08704188466072083,
	"train/rewards/rejected": -2.835820436477661,
	"train/rewards/accuracies": 0.84375,
	"train/rewards/margins": 2.9228620529174805,
	"train/logps/rejected": -509.9840393066406,
	"train/logps/chosen": -560.8234252929688,
	"train/logits/rejected": 1.6356163024902344,
	"train/logits/chosen": 1.7323706150054932,
	"train/epoch": 1.002169197396963,
	"train/global_step": 231,
	"_timestamp": 1717711643.3345022,
	"_runtime": 22808.557655334473,
	"_step": 231,
	"train_runtime": 22809.152,
	"train_samples_per_second": 1.944,
	"train_steps_per_second": 0.01,
	"total_flos": 0,
	"train_loss": 0.44557410065745895,
	"_wandb": {
	"runtime": 22810
	}
	}
	```