A newer version of this model is available: openai/gpt-oss-120b

Alright — let’s make this DeepSeek LoRA fine-tuning script run in a conda virtual environment with CUDA GPU.

I’ll walk you through:

  • Conda environment setup (with CUDA-enabled PyTorch)

  • Installing all dependencies

  • Running the fine-tuning script

1️⃣ Create and activate the conda environment

Open your terminal (or Anaconda Prompt on Windows):

conda create -n deepseek_lora python=3.10 -y
conda activate deepseek_lora

2️⃣ Install PyTorch with CUDA support

Check your GPU's CUDA version:

nvidia-smi

If it says CUDA 12.1, run:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

For CUDA 11.8:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

3️⃣ Install dependencies for LoRA fine-tuning

pip install transformers accelerate datasets peft bitsandbytes sentencepiece einops

✅ Why each is needed

  • transformers → Model + tokenizer

  • accelerate → Efficient multi-GPU & mixed-precision

  • datasets → Easy JSONL loading

  • peft → LoRA training adapters

  • bitsandbytes → 8-bit GPU loading (saves VRAM)

  • sentencepiece → Tokenization for models like Qwen/DeepSeek

  • einops → Model reshaping utilities

4️⃣ Save the fine-tuning script

Save this as finetune_deepseek_lora.py in your project folder. (This is already CUDA-ready; it will auto-detect GPU via device_map="auto".)

import argparse
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model, TaskType
import torch

parser = argparse.ArgumentParser()
parser.add_argument("--train_file", type=str, required=True)
parser.add_argument("--output_dir", type=str, default="lora_deepseek")
parser.add_argument("--epochs", type=int, default=3)
parser.add_argument("--batch_size", type=int, default=4)
parser.add_argument("--micro_batch_size", type=int, default=1)
parser.add_argument("--learning_rate", type=float, default=2e-4)
parser.add_argument("--max_length", type=int, default=1024)
args = parser.parse_args()

MODEL_NAME = "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=False)
if tokenizer.pad_token_id is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load model in 8-bit for CUDA
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    load_in_8bit=True,
    torch_dtype=torch.float16
)
model = prepare_model_for_kbit_training(model)

# Apply LoRA adapters
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, lora_config)

# Load dataset
dataset = load_dataset("json", data_files={"train": args.train_file})

def preprocess(example):
    prompt = example["prompt"].strip()
    response = example["response"].strip()
    text = prompt + "\n\n" + response
    tokenized = tokenizer(text, truncation=True, max_length=args.max_length, padding="max_length")
    tokenized["labels"] = tokenized["input_ids"].copy()
    return tokenized

dataset = dataset["train"].map(preprocess, remove_columns=dataset["train"].column_names)

# Training arguments
training_args = TrainingArguments(
    output_dir=args.output_dir,
    per_device_train_batch_size=args.micro_batch_size,
    gradient_accumulation_steps=args.batch_size // args.micro_batch_size,
    num_train_epochs=args.epochs,
    learning_rate=args.learning_rate,
    fp16=True,
    optim="paged_adamw_32bit",
    logging_steps=50,
    save_strategy="epoch",
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer
)

trainer.train()
model.save_pretrained(args.output_dir)
tokenizer.save_pretrained(args.output_dir)

5️⃣ Prepare your dataset

Create data/train.jsonl:

{"prompt":"What is Python?","response":"Python is a high-level programming language."}
{"prompt":"Explain recursion.","response":"Recursion is a function calling itself until a base condition is met."}

6️⃣ Run fine-tuning in CUDA environment

python finetune_deepseek_lora.py --train_file data/train.jsonl --output_dir lora_deepseek --epochs 3 --batch_size 4 --micro_batch_size 1

This will:

  • Use CUDA automatically

  • Load DeepSeek model in 8-bit for VRAM savings

  • Train only LoRA adapters

  • Save results in lora_deepseek/

If you want, I can also add a CUDA memory-efficient accelerate config so you can run this even on an 8GB VRAM GPU without crashing. Do you want me to do that next?

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ankitkushwaha90/deepseek-lora-finetune-template

Finetuned
(311)
this model