metadata

license: mit
language:
  - en
base_model:
  - NousResearch/Hermes-3-Llama-3.1-8B

Inference with Your Model

This guide explains how to run inference with your custom model using the Hugging Face transformers library.

Prerequisites

Make sure you have the following dependencies installed:

Python 3.7+
PyTorch
Hugging Face transformers library

You can install the required packages using pip:

pip install torch transformers

# Ignore warnings
logging.set_verbosity(logging.CRITICAL)

# Run text generation pipeline with our next model
system_prompt = """"""
prompt = ""

pipe = pipeline(
    task="text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_new_tokens=128,  # Increase this to allow for longer outputs
    temperature=0.5,  # Encourages more varied outputs
    top_k=50,  # Limits to the top 50 tokens
    do_sample=True,  # Enables sampling
    return_full_text=True
)

result = pipe(f"<|im_start|>system\n{system_prompt}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>")
# print(result[0]['generated_text'])
generated_text = result[0]['generated_text']

# Remove the leading system prompt and special tokens
# start_idx = generated_text.find("[/INST]") + len("[/INST]")
# response_text = generated_text[start_idx:].strip()  # Get text after [/INST]

# Print the extracted response text
print(generated_text)