|
--- |
|
license: apache-2.0 |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
> [!WARNING] |
|
> **WARNING:** This is a language model that has undergone instruction tuning for conversational settings that exploit function calling capabilities. It has not been aligned with human preferences. As a result, it may generate outputs that are inappropriate, misleading, biased, or unsafe. These risks can be mitigated through additional post-training stages, which is strongly recommended before deployment in any production system, especially for high-stakes applications. |
|
> **NOTE:** This is a GATED model, intended only for internal and external tests. Do not request access if you have not already contact us and have been given permission to test it. |
|
> Please write carlos.rodriguez1(at)bsc.es to justify use, and we can grant access. |
|
### How to use |
|
``` |
|
from datetime import datetime |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import transformers |
|
import torch |
|
|
|
model_id = "BSC-LT/salamandra-7b-instruct" |
|
|
|
text = "What is the weather like in Paris today?" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16 |
|
) |
|
|
|
message = [ { "role": "user", "content": text } ] |
|
|
|
tools = [{ |
|
"type": "function", |
|
"name": "get_weather", |
|
"description": "Get current temperature for a given location.", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"location": { |
|
"type": "string", |
|
"description": "City and country e.g. Bogotá, Colombia" |
|
} |
|
}, |
|
"required": [ |
|
"location" |
|
], |
|
"additionalProperties": False |
|
} |
|
}] |
|
|
|
|
|
prompt = tokenizer.apply_chat_template( |
|
message, |
|
tokenize=False, |
|
add_generation_prompt=True, |
|
tools=tools |
|
) |
|
|
|
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") |
|
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=1000) |
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
### Deploy with vllm |
|
**Deploy the model using vllm docker image.** |
|
``` |
|
docker run --runtime nvidia --gpus all \ |
|
-v ~/.cache/huggingface:/root/.cache/huggingface \ |
|
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \ |
|
-p 80:80 \ |
|
vllm/vllm-openai:latest \ |
|
--model BSC-LT/salamandra-7b-instruct-tools \ |
|
--enable-auto-tool-choice \ |
|
--tool-call-parser hermes \ |
|
--max_model_len 8196 \ |
|
--port 80 |
|
``` |
|
|
|
**Then use it with openai api** |
|
``` |
|
pip install openai |
|
``` |
|
``` |
|
from openai import OpenAI |
|
client = OpenAI( |
|
base_url="http://localhost:8080/v1/", |
|
api_key="hf_xxxx" |
|
) |
|
|
|
models = client.models.list() |
|
model = models.data[0].id |
|
|
|
system_message = "" |
|
messages = [{ "role": "system", "content": system_message}] if system_message else [] |
|
messages.append( {"role":"user", "content": "What is the weather like in Paris today?"}) |
|
print(messages) |
|
chat_completion = client.chat.completions.create( |
|
model=model, |
|
tools=tools |
|
messages=messages, |
|
stream=False, |
|
max_tokens=1000, |
|
temperature=0.1, |
|
frequency_penalty=0.2, |
|
) |
|
|
|
print(chat_completion) |
|
``` |
|
|
|
|