|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Llama-3.2-1B-Instruct |
|
|
tags: |
|
|
- dpo |
|
|
- preference-learning |
|
|
- random |
|
|
- pruned |
|
|
--- |
|
|
|
|
|
# random_prune_Llama-3.2-1B-Instruct_prune_0.0-sigmoid |
|
|
|
|
|
This model is a DPO (Direct Preference Optimization) fine-tuned version of Llama-3.2-1B-Instruct using the random method. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: Llama-3.2-1B-Instruct |
|
|
- **Training Method**: random |
|
|
- **Pruning Ratio**: unknown |
|
|
- **Training Date**: 2025-09-15 |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
This model was trained using Direct Preference Optimization (DPO) with the following characteristics: |
|
|
- Method: random |
|
|
- Pruning applied during training |
|
|
- Fine-tuned on preference data |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
model_name = "5456es/random_prune_Llama-3.2-1B-Instruct_prune_0.0-sigmoid" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
|
|
# Example usage |
|
|
prompt = "Your prompt here" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=100) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
This model was trained on preference data using the DPO algorithm. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
This model inherits the limitations of its base model and may have additional limitations due to the pruning process. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original DPO paper and the base model. |
|
|
|