File size: 2,481 Bytes
1996433 8239bde ad95d04 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
---
license: apache-2.0
datasets:
- Subh775/formatted-hindi-hinglish-cot
language:
- en
- hi
base_model:
- unsloth/Mistral-Small-Instruct-2409
pipeline_tag: text-generation
library_name: adapter-transformers
tags:
- LoRA
- text-generation-inference
- unsloth
---
## Inference Instructions:
```python
!pip install unsloth
```
```python
from unsloth import FastLanguageModel
from transformers import TextStreamer
import torch
# Load your fine-tuned model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="QuantumInk/Mistral-small-12B-Hinglish-cot",
max_seq_length=2048,
load_in_4bit=True
)
FastLanguageModel.for_inference(model)
# Streamer for real-time decoding
text_streamer = TextStreamer(tokenizer)
# Alpaca prompt template
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input_text}
### Response:
{output}"""
```
```python
# Chat loop with memory
def chat():
print("💬 Chat with Qwen-2.5-Hindi-Hinglish-COT! Type '\\q' or 'quit' to exit.\n")
chat_history = "" # Full chat history with prompts and responses
while True:
user_input = input("➤ ")
if user_input.lower() in ["\\q", "quit"]:
print("\n👋 Exiting chat. Goodbye!")
break
# Format the current prompt
current_prompt = alpaca_prompt.format(
instruction="Continue the following conversation.",
input_text=user_input,
output=""
)
# Add to full chat history
chat_history += current_prompt + "\n"
# Tokenize the full prompt
inputs = tokenizer([chat_history], return_tensors="pt").to("cuda")
print("\n🤖: ", end="") # Prepare for streaming output
# Generate response using streamer
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True,
no_repeat_ngram_size=2,
streamer=text_streamer
)
# Decode and capture response for chat history
full_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
response = full_output.split("### Response:")[-1].strip()
# Add response to chat history
chat_history += f"{response}\n"
# Run the chat
chat()
```
|