5456es
/

random_prune_Llama-3.2-1B-Instruct_prune_0.0-sigmoid

preference-learning

Model card Files Files and versions

random_prune_Llama-3.2-1B-Instruct_prune_0.0-sigmoid / README.md

5456es's picture

Upload README.md with huggingface_hub

f9e31fc verified about 2 months ago

|

history blame contribute delete

1.45 kB

	---
	license: apache-2.0
	base_model: Llama-3.2-1B-Instruct
	tags:
	- dpo
	- preference-learning
	- random
	- pruned
	---

	# random_prune_Llama-3.2-1B-Instruct_prune_0.0-sigmoid

	This model is a DPO (Direct Preference Optimization) fine-tuned version of Llama-3.2-1B-Instruct using the random method.

	## Model Details

	- Base Model: Llama-3.2-1B-Instruct
	- Training Method: random
	- Pruning Ratio: unknown
	- Training Date: 2025-09-15

	## Training Configuration

	This model was trained using Direct Preference Optimization (DPO) with the following characteristics:
	- Method: random
	- Pruning applied during training
	- Fine-tuned on preference data

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "5456es/random_prune_Llama-3.2-1B-Instruct_prune_0.0-sigmoid"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	# Example usage
	prompt = "Your prompt here"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Training Data

	This model was trained on preference data using the DPO algorithm.

	## Limitations

	This model inherits the limitations of its base model and may have additional limitations due to the pruning process.

	## Citation

	If you use this model, please cite the original DPO paper and the base model.