Llama 3.2 1B E-commerce Intent (GPTQ 4-bit)
This is a fine-tuned version of meta-llama/Llama-3.2-1B
that has been specifically trained to act as an e-commerce intent detection model. Given a catalog of products and a user's request, it outputs a structured JSON object representing the user's intent (add
or remove
), the product
name, and the quantity
.
This version of the model is quantized to 4-bit using GPTQ, making it highly efficient for inference in terms of memory usage and speed. The QLoRA adapter was merged into the final GPTQ model - no separate adapter loading is required.
Model Description
The base model, Llama 3.2 1B, was fine-tuned using the QLoRA method on a synthetic dataset of 3000 examples. The training objective was to teach the model to ignore conversational pleasantries and strictly output a JSON object that can be directly parsed by a backend system for managing a shopping cart.
Dataset
The model was fine-tuned on a custom synthetic dataset of 3000 examples.
You can access the dataset here: jtlicardo/ecommerce-intent-3k
Intended Use & Limitations
This model is designed for a specific task: parsing user requests in an e-commerce context. It should not be used as a general-purpose chatbot.
- Primary Use: Backend service for intent detection from user text.
- Out-of-Scope: General conversation, answering questions, or any task not related to adding/removing items from a list.
How to Use
The model expects a prompt formatted in a specific way, following the TinyLlama-Chat template. You must provide the Catalog
and the User
request.
Important: You need to install optimum
and auto-gptq
to run this 4-bit GPTQ model.
pip install -q optimum auto-gptq transformers
Here's how to run inference in Python:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
# Model repository on the Hugging Face Hub
model_id = "jtlicardo/llama_3.2-1b-ecommerce-intent-gptq-4bit"
# Load the tokenizer and the 4-bit quantized model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16
)
# --- Define the prompt ---
catalog = """Catalog:
Shampoo (400ml bottle)
Hand Soap (250ml dispenser)
Peanut Butter (340g jar)
Headphones
Green Tea (25 tea bags)"""
user_query = "Could you please take off 4 pairs of headphons from my cart?"
# --- Format the prompt using the model's chat template ---
# The model was trained to see this structure.
prompt = f"<|user|>\n{catalog}\n\nUser:\n{user_query}\n<|assistant|>\n"
# --- Generate the output ---
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
outputs = pipe(
prompt,
max_new_tokens=50, # Max length of the JSON output
do_sample=False, # Use deterministic output
temperature=None, # Not needed for do_sample=False
top_p=None, # Not needed for do_sample=False
return_full_text=False # Only return the generated part
)
# The output will be a clean JSON string
generated_json = outputs[0]['generated_text'].strip()
print(generated_json)
# Expected output:
# {"action": "remove", "product": "Headphones", "quantity": 4}
Training Procedure
This model was fine-tuned using the trl
library's SFTTrainer
.
- Method: QLoRA (4-bit quantization with LoRA adapters)
- Dataset: A custom JSONL file with 3000
prompt
/completion
pairs. - Configuration:
completion_only_loss=True
was used to ensure the model only learned to generate the assistant's JSON response.
- Downloads last month
- 2
Model tree for jtlicardo/llama_3.2-1b-ecommerce-intent-gptq-4bit
Base model
meta-llama/Llama-3.2-1B