Instructions to use Trelis/Meta-Llama-3-8B-Instruct-function-calling with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Trelis/Meta-Llama-3-8B-Instruct-function-calling with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Trelis/Meta-Llama-3-8B-Instruct-function-calling")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Trelis/Meta-Llama-3-8B-Instruct-function-calling")
model = AutoModelForCausalLM.from_pretrained("Trelis/Meta-Llama-3-8B-Instruct-function-calling")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Trelis/Meta-Llama-3-8B-Instruct-function-calling with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Trelis/Meta-Llama-3-8B-Instruct-function-calling"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trelis/Meta-Llama-3-8B-Instruct-function-calling",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Trelis/Meta-Llama-3-8B-Instruct-function-calling

SGLang

How to use Trelis/Meta-Llama-3-8B-Instruct-function-calling with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Trelis/Meta-Llama-3-8B-Instruct-function-calling" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trelis/Meta-Llama-3-8B-Instruct-function-calling",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Trelis/Meta-Llama-3-8B-Instruct-function-calling" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trelis/Meta-Llama-3-8B-Instruct-function-calling",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use Trelis/Meta-Llama-3-8B-Instruct-function-calling with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Trelis/Meta-Llama-3-8B-Instruct-function-calling to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Trelis/Meta-Llama-3-8B-Instruct-function-calling to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Trelis/Meta-Llama-3-8B-Instruct-function-calling to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Trelis/Meta-Llama-3-8B-Instruct-function-calling",
    max_seq_length=2048,
)

Docker Model Runner
How to use Trelis/Meta-Llama-3-8B-Instruct-function-calling with Docker Model Runner:
```
docker model run hf.co/Trelis/Meta-Llama-3-8B-Instruct-function-calling
```

Let model decide to respond with "assistant" or "function_call"

by evilperson068 - opened Apr 27, 2024

Discussion

evilperson068

Apr 27, 2024

Hi community,
Is it possible to let model decide to respond with "assistant" or "function_call"? Currently it only responds with assistant request if add_generation_prompt set to False.

RonanMcGovern

Trelis org Apr 29, 2024

Howdy.

Yes, I actually tried doing this, and it would be possible if you wanted to do a major fine-tuning.

As it is, the instruct model (which has lots of useful fine-tuning built in) always expects the assistant prompt to be there at the start of the response - so that's how I have left things so as not to majorly change the tuning of the model.

As you point out, the model only responses if you have added that generation prompt.

RonanMcGovern changed discussion status to closed May 7, 2024

evilperson068

May 7, 2024

Howdy.

Yes, I actually tried doing this, and it would be possible if you wanted to do a major fine-tuning.

As it is, the instruct model (which has lots of useful fine-tuning built in) always expects the assistant prompt to be there at the start of the response - so that's how I have left things so as not to majorly change the tuning of the model.

As you point out, the model only responses if you have added that generation prompt.

Hi thanks for your reply, should I acquire a large dataset to do fine-tuning?
I also have a single RTX4090.

Thanks,
Toby.

RonanMcGovern

Trelis org May 10, 2024

What specifically is the problem you are trying to solve?

You should be able to detect whether there's a function call just by checking for a json in the response, so re-tuning to have a response with function_call isn't needed.

RonanMcGovern changed discussion status to open May 10, 2024

evilperson068

May 11, 2024

•

edited May 11, 2024

But according to the example, only responding as "function_call" should trigger a JSON output, responding as "assistant" isn't reliable.
Special roles should have dedicated usage, like "function_metadata" does.
Thanks so much for the aid.

RonanMcGovern

Trelis org May 11, 2024

Actually the model will respond (via the assistant role) with a json when it is appropriate (at least that is what it is supposed to do).

I know it's a bit confusing because I'm then saying to take that response and send it back in with a "function_call" role (which really just indicates to the language model that it's a function call).

To say that once more:

When you send in a user message, there will be an assistant role that follows because of add_generation_prompt . This is correct and will illicit a function call (json) if needed.
Once you have this json AND you have the function response (which you get programmatically), then you should feed them back in using function_call and function_response roles. These aren't "real" roles per se, but they indicate to the model that these are function calls and responses. This helps the model properly make use of the info when giving the next response (for which, btw, the add_generation_prompt will again add an assistant role - according to the chat template).

evilperson068

May 11, 2024

@RonanMcGovern I kinda got it here! So it's basically just generating as normal role system, but when feeding it back I should change the role manually to function_call! :O Now it makes sense. Gonna try thx!! <3

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment