Instructions to use Trelis/Meta-Llama-3-8B-Instruct-function-calling with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Trelis/Meta-Llama-3-8B-Instruct-function-calling with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Trelis/Meta-Llama-3-8B-Instruct-function-calling") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Trelis/Meta-Llama-3-8B-Instruct-function-calling") model = AutoModelForCausalLM.from_pretrained("Trelis/Meta-Llama-3-8B-Instruct-function-calling") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Trelis/Meta-Llama-3-8B-Instruct-function-calling with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Trelis/Meta-Llama-3-8B-Instruct-function-calling" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Trelis/Meta-Llama-3-8B-Instruct-function-calling", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Trelis/Meta-Llama-3-8B-Instruct-function-calling
- SGLang
How to use Trelis/Meta-Llama-3-8B-Instruct-function-calling with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Trelis/Meta-Llama-3-8B-Instruct-function-calling" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Trelis/Meta-Llama-3-8B-Instruct-function-calling", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Trelis/Meta-Llama-3-8B-Instruct-function-calling" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Trelis/Meta-Llama-3-8B-Instruct-function-calling", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use Trelis/Meta-Llama-3-8B-Instruct-function-calling with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Trelis/Meta-Llama-3-8B-Instruct-function-calling to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Trelis/Meta-Llama-3-8B-Instruct-function-calling to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Trelis/Meta-Llama-3-8B-Instruct-function-calling to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Trelis/Meta-Llama-3-8B-Instruct-function-calling", max_seq_length=2048, ) - Docker Model Runner
How to use Trelis/Meta-Llama-3-8B-Instruct-function-calling with Docker Model Runner:
docker model run hf.co/Trelis/Meta-Llama-3-8B-Instruct-function-calling
Let model decide to respond with "assistant" or "function_call"
Hi community,
Is it possible to let model decide to respond with "assistant" or "function_call"? Currently it only responds with assistant request if add_generation_prompt set to False.
Howdy.
Yes, I actually tried doing this, and it would be possible if you wanted to do a major fine-tuning.
As it is, the instruct model (which has lots of useful fine-tuning built in) always expects the assistant prompt to be there at the start of the response - so that's how I have left things so as not to majorly change the tuning of the model.
As you point out, the model only responses if you have added that generation prompt.
Howdy.
Yes, I actually tried doing this, and it would be possible if you wanted to do a major fine-tuning.
As it is, the instruct model (which has lots of useful fine-tuning built in) always expects the assistant prompt to be there at the start of the response - so that's how I have left things so as not to majorly change the tuning of the model.
As you point out, the model only responses if you have added that generation prompt.
Hi thanks for your reply, should I acquire a large dataset to do fine-tuning?
I also have a single RTX4090.
Thanks,
Toby.
What specifically is the problem you are trying to solve?
You should be able to detect whether there's a function call just by checking for a json in the response, so re-tuning to have a response with function_call isn't needed.
But according to the example, only responding as "function_call" should trigger a JSON output, responding as "assistant" isn't reliable.
Special roles should have dedicated usage, like "function_metadata" does.
Thanks so much for the aid.
Actually the model will respond (via the assistant role) with a json when it is appropriate (at least that is what it is supposed to do).
I know it's a bit confusing because I'm then saying to take that response and send it back in with a "function_call" role (which really just indicates to the language model that it's a function call).
To say that once more:
- When you send in a user message, there will be an assistant role that follows because of add_generation_prompt . This is correct and will illicit a function call (json) if needed.
- Once you have this json AND you have the function response (which you get programmatically), then you should feed them back in using function_call and function_response roles. These aren't "real" roles per se, but they indicate to the model that these are function calls and responses. This helps the model properly make use of the info when giving the next response (for which, btw, the add_generation_prompt will again add an assistant role - according to the chat template).
@RonanMcGovern I kinda got it here! So it's basically just generating as normal role system, but when feeding it back I should change the role manually to function_call! :O Now it makes sense. Gonna try thx!! <3