eagle0504
/

qwen-distilled-scout-1.5b-instruct-gen1

@@ -55,11 +55,11 @@ This instruction format ensures that the model understands the task type explici
 The base model [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) was fine-tuned on three different datasets using DeepSpeed across various RunPod infrastructure setups. Below is a consolidated summary of the training configurations and results:
-| Model ID                                                                        | Dataset Description             | GPUs          | vCPUs | RAM (GB) | Disk per GPU | Container Image                                            | Duration | Cost    | DeepSpeed Stage | Precision | Mean Token Accuracy |
-| ------------------------------------------------------------------------------- | ------------------------------- | ------------- | ----- | -------- | ------------ | ---------------------------------------------------------- | -------- | ------- | --------------- | --------- | ------------------- |
-| `eagle0504/finetuned-deepseek-r1-distill-qwen-1.5b-by-openai-gsm8k-enhanced-v2` | OpenAI GSM8K Enhanced v2        | 6 × H100 PCIe | 144   | 1132     | 20 GB        | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 2 hrs    | \~\$28  | Stage 1         | FP16      | 98%                 |
-| `eagle0504/openai-gsm8k-codealpaca-20k-enhanced-deepseek-r1-distill-qwen-1.5b`  | GSM8K + CodeAlpaca-20K Enhanced | 4 × A100 SXM  | 146   | 1144     | 20 GB        | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 2 hrs    | \~\$14+ | Stage 1         | FP16      | 97%                 |
-| `eagle0504/qwen-distilled-scout-1.5b`                                           | Custom CoT + SQL-Reasoning      | 6 × A100 SXM  | 192   | 1536     | 20 GB        | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 1.5 hrs  | \~\$21  | Stage 2         | FP16      | 97%                 |
 ---
@@ -128,15 +128,20 @@ stop_sequence = "</response>"
 stop_ids = tokenizer.encode(stop_sequence, add_special_tokens=False)
 stopping_criteria = StoppingCriteriaList([StopOnTokens([stop_ids])])
 inputs = tokenizer(
-    "<instruction>This is a math problem.</instruction><question>Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether?</question>",
     return_tensors="pt"
 )
 outputs = model.generate(
     **inputs,
-    max_new_tokens=230,
-    stopping_criteria=stopping_criteria
 )
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))

 The base model [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) was fine-tuned on three different datasets using DeepSpeed across various RunPod infrastructure setups. Below is a consolidated summary of the training configurations and results:
+| Model ID                                                                        | Dataset Description             | GPUs          | vCPUs | RAM (GB) | Disk per GPU | Container Image                                            | Duration | Cost    | Total Cost  | DeepSpeed Stage | Precision | Mean Token Accuracy |
+| ------------------------------------------------------------------------------- | ------------------------------- | ------------- | ----- | -------- | ------------ | ---------------------------------------------------------- | -------- | ------- | ----------- | --------------- | --------- | ------------------- |
+| `eagle0504/openai-gsm8k-enhanced-using-together-ai-deepseek-train8k-test1k-v1`  | OpenAI GSM8K Enhanced v2        | 6 × H100 PCIe | 144   | 1132     | 20 GB        | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 3 hrs    | ~$14    | ~$42        | Stage 1         | FP16      | 98%                 |
+| `eagle0504/augmented_codealpaca-20k-using-together-ai-deepseek-v1`              | GSM8K + CodeAlpaca-20K Enhanced | 4 × A100 SXM  | 146   | 1144     | 20 GB        | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 3 hrs    | ~$7+    | ~$21+       | Stage 1         | FP16      | 98%                 |
+| `gretelai/synthetic_text_to_sql`                                                | Custom CoT + SQL-Reasoning      | 6 × A100 SXM  | 192   | 1536     | 20 GB        | `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04` | 2.5 hrs  | ~$21    | ~$52.5      | Stage 2         | FP16      | 97%                 |
 ---
 stop_ids = tokenizer.encode(stop_sequence, add_special_tokens=False)
 stopping_criteria = StoppingCriteriaList([StopOnTokens([stop_ids])])
+prompt = (
+    "<instruction>This is a math problem.</instruction>"
+    "<question>Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether?</question>"
+)
 inputs = tokenizer(
+    prompt,
     return_tensors="pt"
 )
 outputs = model.generate(
     **inputs,
+    max_new_tokens=1024, # use max token limit and this may not be needed because stop word is set up above
+    stopping_criteria=stopping_criteria # stop word is in place so we may not need all 1024 tokens
 )
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))