Does it work with local open interpreter, and how many gigs of ram is required?
#24
by
						
aiworld44
	
							
						- opened
							
					
I tried to do interpreter --local --model mistralai/Mistral-7B-Instruct-v0.1. didnt work
Here is my code. You need to locally save the model in a subfolder ( ./Mistral/ depending on your .py file)
It does work for 1-3 queries. Until it breaks down. As there is absofucking no documentation of how to implement the workflow of Interference to a local pipeline this is the best I got. If people are interested in reverse engineering it. Shot me a message. As it stands now, this is an add front to promote paid services, let's change that.
import gradio as gr
from transformers import pipeline, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("./Mistral/")
pipe = pipeline("text-generation", model="./Mistral/", max_new_tokens=512)
chat_history_tokens = []
def generate(chatlog, is_finished):
    global chat_history_tokens
    # Get the latest message from chat
    new_message = chatlog[-1]['content'] if isinstance(chatlog, list) else chatlog
    # Tokenize new message and extend chat history
    new_message_tokens = tokenizer.encode(new_message, add_special_tokens=False)
    chat_history_tokens = new_message_tokens  # We only keep the last message now
    # Decode tokens to string for the prompt
    prompt = tokenizer.decode(chat_history_tokens)
    
    try:
        print("Debug: Sending this prompt to the model:", prompt)
        outputs = pipe(prompt, pad_token_id=tokenizer.eos_token_id)
        print("Debug: Model's raw output:", outputs)
        # Cleanup the generated text
        generated_text = outputs[0]['generated_text'].replace(prompt, "").strip()
        generated_text = generated_text.replace("Answer:", "").replace("A:", "").strip()
        print("Debug: Generated Text After Cleanup:", generated_text)
        # Tokenize the model's reply and add it to the history
        bot_reply_tokens = tokenizer.encode(generated_text, add_special_tokens=False)
        chat_history_tokens.extend(bot_reply_tokens)
    except Exception as e:
        print("Debug: Caught an exception:", str(e))
        return str(e)
    return generated_text
iface = gr.ChatInterface(fn=generate)
iface.launch()
