metadata
license: mit
language:
- en
base_model:
- NousResearch/Hermes-3-Llama-3.1-8B
Inference with Your Model
This guide explains how to run inference with your custom model using the Hugging Face transformers
library.
Prerequisites
Make sure you have the following dependencies installed:
- Python 3.7+
- PyTorch
- Hugging Face
transformers
library
You can install the required packages using pip:
pip install torch transformers
# Ignore warnings
logging.set_verbosity(logging.CRITICAL)
# Run text generation pipeline with our next model
system_prompt = """"""
prompt = ""
pipe = pipeline(
task="text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=128, # Increase this to allow for longer outputs
temperature=0.5, # Encourages more varied outputs
top_k=50, # Limits to the top 50 tokens
do_sample=True, # Enables sampling
return_full_text=True
)
result = pipe(f"<|im_start|>system\n{system_prompt}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>")
# print(result[0]['generated_text'])
generated_text = result[0]['generated_text']
# Remove the leading system prompt and special tokens
# start_idx = generated_text.find("[/INST]") + len("[/INST]")
# response_text = generated_text[start_idx:].strip() # Get text after [/INST]
# Print the extracted response text
print(generated_text)