Model Card for LLaMA-3.2-3B Tool Caller

This model (LoRA adapter) is a fine-tuned version of LLaMA-3.2-3B that specializes in tool calling capabilities. It has been trained to decide when to use one of two available tools: search_documents or check_and_connect based on user queries, responding with properly formatted JSON function calls.

Model Details

Model Description

This model is a Parameter-Efficient Fine-Tuning (PEFT) adaptation of LLaMA-3.2-3B focused on tool use. It employs Low-Rank Adaptation (LoRA) to efficiently fine-tune the base model for function calling capabilities.

  • Developed by: Uness.fr
  • Model type: Fine-tuned LLM (LoRA)
  • Language(s) (NLP): English
  • License: [Same as base model - specify LLaMA 3.2 license]
  • Finetuned from model: unsloth/Llama-3.2-3B-Instruct (4-bit quantized version)

Model Sources

Uses

Direct Use

This model is designed to be used as an AI assistant that can intelligently determine when to call external tools. It specializes in two specific functions:

  1. search_documents: Triggered when users ask for medical information (prefixed with "Search information about")
  2. check_and_connect: Triggered when users ask about system status or connectivity

The model outputs properly formatted JSON function calls that can be parsed by downstream applications to execute the appropriate tools.

Downstream Use

This model can be integrated into:

  • AI assistants that need to understand when to delegate tasks to external tools

Out-of-Scope Use

This model should not be used for:

  • General text generation without tool calling
  • Tasks requiring more than the two trained tools
  • Critical systems where reliability is essential without human oversight
  • Applications requiring factual accuracy guarantees

Bias, Risks, and Limitations

  • The model inherits biases from the base LLaMA-3.2-3B model
  • Performance depends on how similar user queries are to the training data format
  • There's a strong dependency on the specific prefixing pattern used in training ("Search information about")

Recommendations

Users (both direct and downstream) should:

  • Follow the same prompting patterns used in training for optimal results
  • Include the "Search information about" prefix for queries intended for the search_documents tool
  • Be aware that the model expects a specific system prompt format
  • Test thoroughly before deployment in production environments
  • Consider implementing fallback mechanisms for unrecognized query types

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the model and tokenizer
model_path = "your-username/llama-3-2-3b-tool-caller-lora"  # Replace with actual path
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Define the prompting format (must match training)
SYSTEM_PROMPT = """Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 18 May 2025"""

USER_INSTRUCTION_HEADER = """Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt. 
Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.
{ "type": "function", "function": { "name": "check_and_connect", "description": "check_and_connect", "parameters": { "properties": {}, "type": "object" } } }
{ "type": "function", "function": { "name": "search_documents", "description": "\n Searches for documents based on a user's query string. Use this to find information on a specific topic.\n\n ", "parameters": { "properties": { "query": { "description": "The actual search phrase or question. For example, 'What are the causes of climate change?' or 'population of Madre de Dios'.", "type": "string" } }, "required": [ "query" ], "type": "object" } } }
"""

# Example 1: Information query (add the prefix)
user_query = "What is the capital of France?"
formatted_query = f"Search information about {user_query}"  # Add prefix for search_documents in French

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{formatted_query}"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(input_ids=inputs, max_new_tokens=128)
response = tokenizer.decode(outputs[0, inputs.shape[-1]:], skip_special_tokens=True)
print(response)

# Example 2: System status query (no prefix needed)
status_query = "Are we connected?"
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"{USER_INSTRUCTION_HEADER}{status_query}"},
]

# Generate response...

Training Details

Training Data

The model was trained on a custom dataset with 1,050 examples from asanchez75/tool_finetuning_dataset:

  • 1,000 examples derived from the "maximedb/natural_questions" dataset, modified with "Search information about" prefix
  • 50 examples of system status queries for the "check_and_connect" tool

The dataset was created in JSONL format with each entry having a complete conversation structure including system, user, and assistant messages.

Training Procedure

The model was fine-tuned using Unsloth's optimized implementation of LoRA over a 4-bit quantized version of LLaMA-3.2-3B-Instruct.

Training Hyperparameters

  • Training regime: 4-bit quantization with LoRA
  • LoRA rank: 16
  • LoRA alpha: 16
  • LoRA dropout: 0
  • Target modules: "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"
  • Learning rate: 2e-4
  • Batch size: 2 per device
  • Gradient accumulation steps: 4
  • Warmup steps: 5
  • Number of epochs: 3
  • Optimizer: adamw_8bit
  • Weight decay: 0.01
  • LR scheduler: linear
  • Max sequence length: 2048
  • Packing: False
  • Random seed: 3407

Speeds, Sizes, Times

  • Training hardware: [GPU type, e.g., NVIDIA A100, etc.]
  • Training time: [Approximately X minutes based on training code output]
  • Model size: Base model is 3B parameters; LoRA adapter is significantly smaller

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on sample inference examples from both categories:

  • Information queries with "Search information about" prefix
  • System status queries

Metrics

  • Accuracy: Measured by whether the model correctly selects the appropriate tool for the query type
  • Format correctness: Whether the JSON output is properly formatted and parsable

Results

Qualitative evaluation showed the model successfully distinguishes between:

  • Queries that should trigger the search_documents tool (when prefixed appropriately)
  • Queries that should trigger the check_and_connect tool

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [GPU model]
  • Hours used: [Estimated from training time]
  • Cloud Provider: [If applicable]
  • Compute Region: [If applicable]
  • Carbon Emitted: [Estimate if available]

Technical Specifications

Model Architecture and Objective

  • Base architecture: LLaMA-3.2-3B
  • Adaptation method: LoRA fine-tuning
  • Objective: Train the model to output properly formatted JSON function calls based on input query type

Compute Infrastructure

Hardware

  • The model was trained using CUDA-compatible GPU(s)
  • Memory usage metrics are reported in the training script

Software

  • Unsloth: Fast implementation of LLaMA models
  • PyTorch: Deep learning framework
  • Transformers: Hugging Face's transformers library
  • PEFT: Parameter-Efficient Fine-Tuning library
  • TRL: Transformer Reinforcement Learning library

Framework versions

  • PEFT 0.15.2
  • Transformers [version]
  • PyTorch [version]
  • Unsloth [version]

Model Card Contact

[Your contact information]

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for asanchez75/Llama-3.2-3B-tool-search

Adapter
(287)
this model