--- datasets: - WizardLM/WizardLM_evol_instruct_V2_196k - Open-Orca/OpenOrca language: - en --- # Writer/palmyra-20b-chat --- # Usage ```py import torch from transformers import AutoTokenizer, AutoModelForCausalLM from transformers import TextStreamer model_name = "Writer/palmyra-20b-chat" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto", ) prompt = "What is the meaning of life?" input_text = ( "A chat between a curious user and an artificial intelligence assistant. " "The assistant gives helpful, detailed, and polite answers to the user's questions. " "USER: {prompt} " "ASSISTANT:" ) model_inputs = tokenizer(input_text.format(prompt=prompt), return_tensors="pt").to( "cuda" ) gen_conf = { "top_k": 20, "max_new_tokens": 2048, "temperature": 0.6, "do_sample": True, "eos_token_id": tokenizer.eos_token_id, } streamer = TextStreamer(tokenizer) if "token_type_ids" in model_inputs: del model_inputs["token_type_ids"] all_inputs = {**model_inputs, **gen_conf} _ = model.generate(**all_inputs, streamer=streamer) ```