Update README.md
Browse files
README.md
CHANGED
@@ -55,11 +55,11 @@ This instruction format ensures that the model understands the task type explici
|
|
55 |
|
56 |
The base model [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) was fine-tuned on three different datasets using DeepSpeed across various RunPod infrastructure setups. Below is a consolidated summary of the training configurations and results:
|
57 |
|
58 |
-
| Model ID | Dataset Description | GPUs | vCPUs | RAM (GB) | Disk per GPU | Container Image | Duration | Cost | DeepSpeed Stage | Precision | Mean Token Accuracy |
|
59 |
-
| ------------------------------------------------------------------------------- | ------------------------------- | ------------- | ----- | -------- | ------------ | ---------------------------------------------------------- | -------- | ------- | --------------- | --------- | ------------------- |
|
60 |
-
| `eagle0504/
|
61 |
-
| `eagle0504/
|
62 |
-
| `
|
63 |
|
64 |
---
|
65 |
|
@@ -128,15 +128,20 @@ stop_sequence = "</response>"
|
|
128 |
stop_ids = tokenizer.encode(stop_sequence, add_special_tokens=False)
|
129 |
stopping_criteria = StoppingCriteriaList([StopOnTokens([stop_ids])])
|
130 |
|
|
|
|
|
|
|
|
|
|
|
131 |
inputs = tokenizer(
|
132 |
-
|
133 |
return_tensors="pt"
|
134 |
)
|
135 |
|
136 |
outputs = model.generate(
|
137 |
**inputs,
|
138 |
-
max_new_tokens=
|
139 |
-
stopping_criteria=stopping_criteria
|
140 |
)
|
141 |
|
142 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
|
55 |
|
56 |
The base model [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) was fine-tuned on three different datasets using DeepSpeed across various RunPod infrastructure setups. Below is a consolidated summary of the training configurations and results:
|
57 |
|
58 |
+
| Model ID | Dataset Description | GPUs | vCPUs | RAM (GB) | Disk per GPU | Container Image | Duration | Cost | Total Cost | DeepSpeed Stage | Precision | Mean Token Accuracy |
|
59 |
+
| ------------------------------------------------------------------------------- | ------------------------------- | ------------- | ----- | -------- | ------------ | ---------------------------------------------------------- | -------- | ------- | ----------- | --------------- | --------- | ------------------- |
|
60 |
+
| `eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1` | OpenAI GSM8K Enhanced v2 | 6 × H100 PCIe | 144 | 1132 | 20 GB | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 3 hrs | ~$14 | ~$42 | Stage 1 | FP16 | 98% |
|
61 |
+
| `eagle0504/augmented_codealpaca-20k-using-together-ai-deepseek-v1` | GSM8K + CodeAlpaca-20K Enhanced | 4 × A100 SXM | 146 | 1144 | 20 GB | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 3 hrs | ~$7+ | ~$21+ | Stage 1 | FP16 | 98% |
|
62 |
+
| `gretelai/synthetic_text_to_sql` | Custom CoT + SQL-Reasoning | 6 × A100 SXM | 192 | 1536 | 20 GB | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 2.5 hrs | ~$21 | ~$52.5 | Stage 2 | FP16 | 97% |
|
63 |
|
64 |
---
|
65 |
|
|
|
128 |
stop_ids = tokenizer.encode(stop_sequence, add_special_tokens=False)
|
129 |
stopping_criteria = StoppingCriteriaList([StopOnTokens([stop_ids])])
|
130 |
|
131 |
+
prompt = (
|
132 |
+
"<instruction>This is a math problem.</instruction>"
|
133 |
+
"<question>Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether?</question>"
|
134 |
+
)
|
135 |
+
|
136 |
inputs = tokenizer(
|
137 |
+
prompt,
|
138 |
return_tensors="pt"
|
139 |
)
|
140 |
|
141 |
outputs = model.generate(
|
142 |
**inputs,
|
143 |
+
max_new_tokens=1024, # use max token limit and this may not be needed because stop word is set up above
|
144 |
+
stopping_criteria=stopping_criteria # stop word is in place so we may not need all 1024 tokens
|
145 |
)
|
146 |
|
147 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|