--- license: apache-2.0 library_name: transformers pipeline_tag: text-generation --- > [!WARNING] > **WARNING:** This is a language model that has undergone instruction tuning for conversational settings that exploit function calling capabilities. It has not been aligned with human preferences. As a result, it may generate outputs that are inappropriate, misleading, biased, or unsafe. These risks can be mitigated through additional post-training stages, which is strongly recommended before deployment in any production system, especially for high-stakes applications. > **NOTE:** This is a GATED model, intended only for internal and external tests. Do not request access if you have not already contact us and have been given permission to test it. > Please write carlos.rodriguez1(at)bsc.es to justify use, and we can grant access. ### How to use ``` from datetime import datetime from transformers import AutoTokenizer, AutoModelForCausalLM import transformers import torch model_id = "BSC-LT/salamandra-7b-instruct" text = "What is the weather like in Paris today?" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype=torch.bfloat16 ) message = [ { "role": "user", "content": text } ] tools = [{ "type": "function", "name": "get_weather", "description": "Get current temperature for a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City and country e.g. Bogotá, Colombia" } }, "required": [ "location" ], "additionalProperties": False } }] prompt = tokenizer.apply_chat_template( message, tokenize=False, add_generation_prompt=True, tools=tools ) inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=1000) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Deploy with vllm **Deploy the model using vllm docker image.** ``` docker run --runtime nvidia --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HUGGING_FACE_HUB_TOKEN=" \ -p 80:80 \ vllm/vllm-openai:latest \ --model BSC-LT/salamandra-7b-instruct-tools \ --enable-auto-tool-choice \ --tool-call-parser hermes \ --max_model_len 8196 \ --port 80 ``` **Then use it with openai api** ``` pip install openai ``` ``` from openai import OpenAI client = OpenAI( base_url="http://localhost:8080/v1/", api_key="hf_xxxx" ) models = client.models.list() model = models.data[0].id system_message = "" messages = [{ "role": "system", "content": system_message}] if system_message else [] messages.append( {"role":"user", "content": "What is the weather like in Paris today?"}) print(messages) chat_completion = client.chat.completions.create( model=model, tools=tools messages=messages, stream=False, max_tokens=1000, temperature=0.1, frequency_penalty=0.2, ) print(chat_completion) ```