Subh775 commited on
Commit
ad95d04
·
verified ·
1 Parent(s): dbf1bfe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -0
README.md CHANGED
@@ -15,5 +15,87 @@ tags:
15
  - unsloth
16
  ---
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  Changes will be Pushed here soon
 
15
  - unsloth
16
  ---
17
 
18
+ ## Inference Instructions:
19
+
20
+ ```python
21
+ !pip install unsloth
22
+ ```
23
+
24
+ ```python
25
+ from unsloth import FastLanguageModel
26
+ from transformers import TextStreamer
27
+ import torch
28
+
29
+ # Load your fine-tuned model
30
+ model, tokenizer = FastLanguageModel.from_pretrained(
31
+ model_name="QuantumInk/Mistral-small-12B-Hinglish-cot",
32
+ max_seq_length=2048,
33
+ load_in_4bit=True
34
+ )
35
+ FastLanguageModel.for_inference(model)
36
+
37
+ # Streamer for real-time decoding
38
+ text_streamer = TextStreamer(tokenizer)
39
+
40
+ # Alpaca prompt template
41
+ alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
42
+ ### Instruction:
43
+ {instruction}
44
+ ### Input:
45
+ {input_text}
46
+ ### Response:
47
+ {output}"""
48
+ ```
49
+
50
+ ```python
51
+ # Chat loop with memory
52
+ def chat():
53
+ print("💬 Chat with Qwen-2.5-Hindi-Hinglish-COT! Type '\\q' or 'quit' to exit.\n")
54
+
55
+ chat_history = "" # Full chat history with prompts and responses
56
+
57
+ while True:
58
+ user_input = input("➤ ")
59
+
60
+ if user_input.lower() in ["\\q", "quit"]:
61
+ print("\n👋 Exiting chat. Goodbye!")
62
+ break
63
+
64
+ # Format the current prompt
65
+ current_prompt = alpaca_prompt.format(
66
+ instruction="Continue the following conversation.",
67
+ input_text=user_input,
68
+ output=""
69
+ )
70
+
71
+ # Add to full chat history
72
+ chat_history += current_prompt + "\n"
73
+
74
+ # Tokenize the full prompt
75
+ inputs = tokenizer([chat_history], return_tensors="pt").to("cuda")
76
+
77
+ print("\n🤖: ", end="") # Prepare for streaming output
78
+
79
+ # Generate response using streamer
80
+ outputs = model.generate(
81
+ **inputs,
82
+ max_new_tokens=256,
83
+ temperature=0.7,
84
+ top_p=0.9,
85
+ do_sample=True,
86
+ no_repeat_ngram_size=2,
87
+ streamer=text_streamer
88
+ )
89
+
90
+ # Decode and capture response for chat history
91
+ full_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
92
+ response = full_output.split("### Response:")[-1].strip()
93
+
94
+ # Add response to chat history
95
+ chat_history += f"{response}\n"
96
+
97
+ # Run the chat
98
+ chat()
99
+ ```
100
 
101
  Changes will be Pushed here soon