eagle0504 commited on
Commit
c4b1d00
·
verified ·
1 Parent(s): e52ad68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -8
README.md CHANGED
@@ -55,11 +55,11 @@ This instruction format ensures that the model understands the task type explici
55
 
56
  The base model [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) was fine-tuned on three different datasets using DeepSpeed across various RunPod infrastructure setups. Below is a consolidated summary of the training configurations and results:
57
 
58
- | Model ID | Dataset Description | GPUs | vCPUs | RAM (GB) | Disk per GPU | Container Image | Duration | Cost | DeepSpeed Stage | Precision | Mean Token Accuracy |
59
- | ------------------------------------------------------------------------------- | ------------------------------- | ------------- | ----- | -------- | ------------ | ---------------------------------------------------------- | -------- | ------- | --------------- | --------- | ------------------- |
60
- | `eagle0504/finetuned-deepseek-r1-distill-qwen-1.5b-by-openai-gsm8k-enhanced-v2` | OpenAI GSM8K Enhanced v2 | 6 × H100 PCIe | 144 | 1132 | 20 GB | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 2 hrs | \~\$28 | Stage 1 | FP16 | 98% |
61
- | `eagle0504/openai-gsm8k-codealpaca-20k-enhanced-deepseek-r1-distill-qwen-1.5b` | GSM8K + CodeAlpaca-20K Enhanced | 4 × A100 SXM | 146 | 1144 | 20 GB | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 2 hrs | \~\$14+ | Stage 1 | FP16 | 97% |
62
- | `eagle0504/qwen-distilled-scout-1.5b` | Custom CoT + SQL-Reasoning | 6 × A100 SXM | 192 | 1536 | 20 GB | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 1.5 hrs | \~\$21 | Stage 2 | FP16 | 97% |
63
 
64
  ---
65
 
@@ -128,15 +128,20 @@ stop_sequence = "</response>"
128
  stop_ids = tokenizer.encode(stop_sequence, add_special_tokens=False)
129
  stopping_criteria = StoppingCriteriaList([StopOnTokens([stop_ids])])
130
 
 
 
 
 
 
131
  inputs = tokenizer(
132
- "<instruction>This is a math problem.</instruction><question>Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether?</question>",
133
  return_tensors="pt"
134
  )
135
 
136
  outputs = model.generate(
137
  **inputs,
138
- max_new_tokens=230,
139
- stopping_criteria=stopping_criteria
140
  )
141
 
142
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 
55
 
56
  The base model [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) was fine-tuned on three different datasets using DeepSpeed across various RunPod infrastructure setups. Below is a consolidated summary of the training configurations and results:
57
 
58
+ | Model ID | Dataset Description | GPUs | vCPUs | RAM (GB) | Disk per GPU | Container Image | Duration | Cost | Total Cost | DeepSpeed Stage | Precision | Mean Token Accuracy |
59
+ | ------------------------------------------------------------------------------- | ------------------------------- | ------------- | ----- | -------- | ------------ | ---------------------------------------------------------- | -------- | ------- | ----------- | --------------- | --------- | ------------------- |
60
+ | `eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1` | OpenAI GSM8K Enhanced v2 | 6 × H100 PCIe | 144 | 1132 | 20 GB | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 3 hrs | ~$14 | ~$42 | Stage 1 | FP16 | 98% |
61
+ | `eagle0504/augmented_codealpaca-20k-using-together-ai-deepseek-v1` | GSM8K + CodeAlpaca-20K Enhanced | 4 × A100 SXM | 146 | 1144 | 20 GB | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 3 hrs | ~$7+ | ~$21+ | Stage 1 | FP16 | 98% |
62
+ | `gretelai/synthetic_text_to_sql` | Custom CoT + SQL-Reasoning | 6 × A100 SXM | 192 | 1536 | 20 GB | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 2.5 hrs | ~$21 | ~$52.5 | Stage 2 | FP16 | 97% |
63
 
64
  ---
65
 
 
128
  stop_ids = tokenizer.encode(stop_sequence, add_special_tokens=False)
129
  stopping_criteria = StoppingCriteriaList([StopOnTokens([stop_ids])])
130
 
131
+ prompt = (
132
+ "<instruction>This is a math problem.</instruction>"
133
+ "<question>Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether?</question>"
134
+ )
135
+
136
  inputs = tokenizer(
137
+ prompt,
138
  return_tensors="pt"
139
  )
140
 
141
  outputs = model.generate(
142
  **inputs,
143
+ max_new_tokens=1024, # use max token limit and this may not be needed because stop word is set up above
144
+ stopping_criteria=stopping_criteria # stop word is in place so we may not need all 1024 tokens
145
  )
146
 
147
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))