sweatSmile commited on
Commit
1d12afb
·
verified ·
1 Parent(s): 5fbbbde

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md CHANGED
@@ -1,3 +1,54 @@
1
  ---
 
 
 
 
 
 
 
 
2
  license: apache-2.0
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
3
+ tags:
4
+ - instruction-following
5
+ - conversational-ai
6
+ - lora
7
+ - alpaca
8
+ - 4bit
9
+ - intruct
10
  license: apache-2.0
11
+ datasets:
12
+ - tatsu-lab/alpaca
13
+ language:
14
+ - en
15
  ---
16
+
17
+ # DeepSeek-R1-Distill-Qwen-1.5B-Alpaca-Instruct
18
+
19
+ Fine-tuned DeepSeek-R1-Distill-Qwen-1.5B for instruction-following tasks using LoRA on the Alpaca dataset.
20
+
21
+ ## Overview
22
+ - **Base Model:** deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B (1.5B parameters)
23
+ - **Fine-tuning Method:** LoRA (4-bit quantization)
24
+ - **Dataset:** Alpaca instruction dataset (52K samples)
25
+ - **Training:** 3 epochs with optimized hyperparameters
26
+
27
+ ## Key Features
28
+ - Improved instruction following capabilities
29
+ - Conversational AI for question answering
30
+ - Memory efficient training with LoRA
31
+ - Production-ready merged model
32
+
33
+ ## Usage
34
+
35
+ ```python
36
+ from transformers import AutoTokenizer, AutoModelForCausalLM
37
+
38
+ model = AutoModelForCausalLM.from_pretrained("sweatSmile/DeepSeek-R1-Distill-Qwen-1.5B-Alpaca-Instruct")
39
+ tokenizer = AutoTokenizer.from_pretrained("sweatSmile/DeepSeek-R1-Distill-Qwen-1.5B-Alpaca-Instruct")
40
+
41
+ # Example
42
+ prompt = "Human: What is machine learning?\n\nAssistant:"
43
+ inputs = tokenizer(prompt, return_tensors="pt")
44
+ outputs = model.generate(**inputs, max_new_tokens=200)
45
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
46
+ ```
47
+
48
+ ## Training Details
49
+ - LoRA rank: 8, alpha: 16
50
+ - 4-bit NF4 quantization with bfloat16
51
+ - Learning rate: 1e-4 with cosine scheduling
52
+ - Batch size: 8, Max length: 512 tokens
53
+
54
+ Trained for efficient deployment in production environments.