Model Information
A lightweight Llama-based model with a conversational tone of voice. This model is based on Llama-3.2-1B-Instruct
and has been finetuned by the team at Restack to respond in a natural, conversational tone suitable for voice interactions.
The model is compatible with the Ultravox speech-to-text model. You can replace the Llama-3.2-1B-Instruct
backbone of ultravox-v0_5-llama-3_2-1b
with this model, to obtain a speech-to-text model that responds in a conversational tone.
How to use
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# --------------------------------------------------------------------------------------
# *** Settings
model_id = "restack/conversational-v1.1-Llama-3.2-1B-Instruct"
system_prompt = "You are a helpful assistant. You answer questions in a natural, conversational tone, like in a spoken conversation."
user_prompt = "In the context of machine learning, what is regularization?"
eot_token = "<|eot_id|>"
pad_token = "<|finetune_right_pad_id|>"
assistant_header = "<|start_header_id|>assistant<|end_header_id|>\n\n"
prompt_template = {
"system": f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nPLACEHOLDER_SYSTEM_PROMPT{eot_token}",
"user": f"<|start_header_id|>user<|end_header_id|>\n\nPLACEHOLDER_QUESTION{eot_token}",
"assistant": f"{assistant_header}PLACEHOLDER_ANSWER{eot_token}",
}
torch_dtype = torch.bfloat16
max_length = 1024
temperature = 0.5
# --------------------------------------------------------------------------------------
# *** Load model
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch_dtype,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# --------------------------------------------------------------------------------------
# *** Inference
pad_token_id = tokenizer.encode(pad_token, add_special_tokens=False)[0]
# Combine system prompt, user prompt, and assistant header.
messages = (
prompt_template["system"].replace("PLACEHOLDER_SYSTEM_PROMPT", system_prompt)
+ prompt_template["user"].replace("PLACEHOLDER_QUESTION", user_prompt)
+ assistant_header
)
tokens = tokenizer(
messages,
add_special_tokens=False,
return_tensors="pt",
)
input_ids = tokens["input_ids"].to(model.device)
attention_mask = tokens["attention_mask"].to(model.device)
model.eval()
generated_ids = model.generate(
input_ids=input_ids,
attention_mask=attention_mask,
max_length=max_length,
pad_token_id=pad_token_id,
temperature=temperature,
)
generated_text = tokenizer.batch_decode(
generated_ids,
skip_special_tokens=False,
)[0]
# Remove the prompt and special tokens, leaving only the assistant answer.
assistant_response = (
generated_text.split(assistant_header)[-1].replace(eot_token, "").strip()
)
print(assistant_response)
# Model prediction:
# So, regularization is basically a way to prevent overfitting in machine learning.
# Think of it like this: if you're trying to fit a model to a bunch of data, it's easy
# to get it to fit the noise in the data instead of the actual pattern. That's called
# overfitting. Regularization helps by adding a penalty term to the loss function. It
# makes the model more simple and less likely to fit the noise, so it doesn't overfit.
Details
- License: MIT
- Research report: ...
- Finetuning dataset: https://huggingface.co/datasets/restack/conversational-question-answer-wikipedia-v1.0
{
"base_model": "fixie-ai/ultravox-v0_5-llama-3_2-1b",
"batch_config": {
"accumulation_steps": 2,
"batch_size": 16,
"batch_size_val": 32
},
"learning_rate_params": {
"div_factor": 25.0,
"final_div_factor": 1000.0,
"learning_rate": 0.0002,
"lr_scheduler_name": "OneCycleLR",
"pct_start": 0.3
},
"lora_alpha": 64,
"lora_dropout": 0.2,
"lora_r": 64,
"n_epochs": 5,
"padding_side": "left",
"quantization_config": {
"llm_int8_skip_modules": [
"lm_head"
],
"llm_int8_threshold": 6.0,
"load_in_4bit": false,
"load_in_8bit": true
},
"target_modules": [
"k_proj",
"q_proj",
"v_proj"
],
"torch_dtype": "torch.bfloat16",
"train_bias": "lora_only",
}
- Downloads last month
- 66
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support