Model Card for LLaMA-3.2-3B Tool Caller
This model (LoRA adapter) is a fine-tuned version of LLaMA-3.2-3B that specializes in tool calling capabilities.
It has been trained to decide when to use one of two available tools: search_documents
or check_and_connect
based on user queries, responding with properly formatted JSON function calls.
Model Details
Model Description
This model is a Parameter-Efficient Fine-Tuning (PEFT) adaptation of LLaMA-3.2-3B focused on tool use. It employs Low-Rank Adaptation (LoRA) to efficiently fine-tune the base model for function calling capabilities.
- Developed by: Uness.fr
- Model type: Fine-tuned LLM (LoRA)
- Language(s) (NLP): English
- License: [Same as base model - specify LLaMA 3.2 license]
- Finetuned from model: unsloth/Llama-3.2-3B-Instruct (4-bit quantized version)
Model Sources
- Repository: https://huggingface.co/asanchez75/Llama-3.2-3B-tool-search
- Base model: https://huggingface.co/unsloth/Llama-3.2-3B-Instruct
- Training dataset: https://huggingface.co/datasets/asanchez75/tool_finetuning_dataset
Uses
Direct Use
This model is designed to be used as an AI assistant that can intelligently determine when to call external tools. It specializes in two specific functions:
search_documents
: Triggered when users ask for medical information (prefixed with "Search information about")check_and_connect
: Triggered when users ask about system status or connectivity
The model outputs properly formatted JSON function calls that can be parsed by downstream applications to execute the appropriate tools.
Downstream Use
This model can be integrated into:
- AI assistants that need to understand when to delegate tasks to external tools
Out-of-Scope Use
This model should not be used for:
- General text generation without tool calling
- Tasks requiring more than the two trained tools
- Critical systems where reliability is essential without human oversight
- Applications requiring factual accuracy guarantees
Bias, Risks, and Limitations
- The model inherits biases from the base LLaMA-3.2-3B model
- Performance depends on how similar user queries are to the training data format
- There's a strong dependency on the specific prefixing pattern used in training ("Search information about")
Recommendations
Users (both direct and downstream) should:
- Follow the same prompting patterns used in training for optimal results
- Include the "Search information about" prefix for queries intended for the search_documents tool
- Be aware that the model expects a specific system prompt format
- Test thoroughly before deployment in production environments
- Consider implementing fallback mechanisms for unrecognized query types
How to Get Started with the Model
Use the code below to get started with the model:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load the model and tokenizer
model_path = "your-username/llama-3-2-3b-tool-caller-lora" # Replace with actual path
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Define the prompting format (must match training)
SYSTEM_PROMPT = """Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 18 May 2025"""
USER_INSTRUCTION_HEADER = """Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.
Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.
{ "type": "function", "function": { "name": "check_and_connect", "description": "check_and_connect", "parameters": { "properties": {}, "type": "object" } } }
{ "type": "function", "function": { "name": "search_documents", "description": "\n Searches for documents based on a user's query string. Use this to find information on a specific topic.\n\n ", "parameters": { "properties": { "query": { "description": "The actual search phrase or question. For example, 'What are the causes of climate change?' or 'population of Madre de Dios'.", "type": "string" } }, "required": [ "query" ], "type": "object" } } }
"""
# Example 1: Information query (add the prefix)
user_query = "What is the capital of France?"
formatted_query = f"Search information about {user_query}" # Add prefix for search_documents in French
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{formatted_query}"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(input_ids=inputs, max_new_tokens=128)
response = tokenizer.decode(outputs[0, inputs.shape[-1]:], skip_special_tokens=True)
print(response)
# Example 2: System status query (no prefix needed)
status_query = "Are we connected?"
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{status_query}"},
]
# Generate response...
Training Details
Training Data
The model was trained on a custom dataset with 1,050 examples from asanchez75/tool_finetuning_dataset:
- 1,000 examples derived from the "maximedb/natural_questions" dataset, modified with "Search information about" prefix
- 50 examples of system status queries for the "check_and_connect" tool
The dataset was created in JSONL format with each entry having a complete conversation structure including system, user, and assistant messages.
Training Procedure
The model was fine-tuned using Unsloth's optimized implementation of LoRA over a 4-bit quantized version of LLaMA-3.2-3B-Instruct.
Training Hyperparameters
- Training regime: 4-bit quantization with LoRA
- LoRA rank: 16
- LoRA alpha: 16
- LoRA dropout: 0
- Target modules: "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"
- Learning rate: 2e-4
- Batch size: 2 per device
- Gradient accumulation steps: 4
- Warmup steps: 5
- Number of epochs: 3
- Optimizer: adamw_8bit
- Weight decay: 0.01
- LR scheduler: linear
- Max sequence length: 2048
- Packing: False
- Random seed: 3407
Speeds, Sizes, Times
- Training hardware: [GPU type, e.g., NVIDIA A100, etc.]
- Training time: [Approximately X minutes based on training code output]
- Model size: Base model is 3B parameters; LoRA adapter is significantly smaller
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model was evaluated on sample inference examples from both categories:
- Information queries with "Search information about" prefix
- System status queries
Metrics
- Accuracy: Measured by whether the model correctly selects the appropriate tool for the query type
- Format correctness: Whether the JSON output is properly formatted and parsable
Results
Qualitative evaluation showed the model successfully distinguishes between:
- Queries that should trigger the
search_documents
tool (when prefixed appropriately) - Queries that should trigger the
check_and_connect
tool
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [GPU model]
- Hours used: [Estimated from training time]
- Cloud Provider: [If applicable]
- Compute Region: [If applicable]
- Carbon Emitted: [Estimate if available]
Technical Specifications
Model Architecture and Objective
- Base architecture: LLaMA-3.2-3B
- Adaptation method: LoRA fine-tuning
- Objective: Train the model to output properly formatted JSON function calls based on input query type
Compute Infrastructure
Hardware
- The model was trained using CUDA-compatible GPU(s)
- Memory usage metrics are reported in the training script
Software
- Unsloth: Fast implementation of LLaMA models
- PyTorch: Deep learning framework
- Transformers: Hugging Face's transformers library
- PEFT: Parameter-Efficient Fine-Tuning library
- TRL: Transformer Reinforcement Learning library
Framework versions
- PEFT 0.15.2
- Transformers [version]
- PyTorch [version]
- Unsloth [version]
Model Card Contact
[Your contact information]
- Downloads last month
- 17
Model tree for asanchez75/Llama-3.2-3B-tool-search
Base model
meta-llama/Llama-3.2-3B-Instruct