SmolLM2-135M-Instruct-TIFA
Model Description
SmolLM2-135M-Instruct-TIFA is a fine-tuned version of HuggingFaceTB/SmolLM2-135M-Instruct specifically trained for TIFA (Text-to-Image Faithfulness Assessment). This model generates structured evaluation questions to assess how faithfully text-to-image models represent given text descriptions.
Intended Use
This model is designed to automatically generate evaluation questions for text-to-image models by creating four specific types of questions:
- Negative question: Should have "no" as the answer
- Object identification: Should have a single word answer directly from the description
- Attribute identification: Should have a single word answer directly from the description
- Positive question: Should have "yes" as the answer
Model Details
- Base Model: HuggingFaceTB/SmolLM2-135M-Instruct
- Model Size: 135M parameters
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Training Framework: Transformers + TRL + PEFT
- License: apache-2.0
Training Details
Training Configuration
Training Method: Supervised Fine-Tuning (SFT) with LoRA
LoRA Configuration:
- r: 16
- lora_alpha: 32
- lora_dropout: 0.05
- Target modules:
["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
Training Parameters:
- Epochs: 4
- Learning Rate: 2e-4
- Batch Size: 8 (per device)
- Gradient Accumulation Steps: 2
- Max Sequence Length: 512
- Optimizer: AdamW
- Weight Decay: 0.01
- Warmup Steps: 200
Dataset
The model was trained on a structured dataset containing 5,000 examples created using Gemini, formatted as conversation data in JSONL format.
Usage
Installation
pip install transformers torch
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
model_path = "kawchar85/SmolLM2-135M-Instruct-TIFA"
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
trust_remote_code=True,
device_map="auto"
)
# Create pipeline
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
device=0 if torch.cuda.is_available() else -1,
return_full_text=False,
)
# Generate evaluation questions
description = "khaki triangles and azure crescents"
user_msg = (
f"Create 4 questions to evaluate a text-to-image model's faithfulness to this description: "
f'"{description}".\n'
"The first question should have 'no' as the answer, "
"the second and third questions should have answers that are a single word directly taken "
"from the description, and the fourth question should have 'yes' as the answer."
)
messages = [{"role": "user", "content": user_msg}]
output = pipe(
messages,
max_new_tokens=256,
do_sample=False,
)
print(output[0]["generated_text"])
Example Output
For the description "khaki triangles and azure crescents", the model generates:
Q1: Are the triangles green?
Choices: ['no', 'yes']
Answer: no
Q2: What color are the triangles?
Choices: ['blue', 'red', 'khaki', 'green']
Answer: khaki
Q3: What shape are the objects?
Choices: ['squares', 'circles', 'crescents', 'triangles']
Answer: crescents
Q4: Are there azure crescents in the image?
Choices: ['no', 'yes']
Answer: yes
Limitations
- The model is specialized for TIFA evaluation and may not perform well on general conversation tasks
- Limited to generating 4-question evaluation sets in the trained format
- Performance depends on the quality and diversity of the training dataset
- Sometimes generates duplicated questions for Q2 and Q3 due to the small dataset used for training or model knowledge limitations
Technical Specifications
- Architecture: Transformer-based language model
- Precision: FP16
- Context Length: 512 tokens
- Inference Speed: Optimized for quick question generation
Citation
@misc{smollm2-135m-it-tifa-2025,
title={SmolLM2-135M-Instruct-TIFA: A Fine-tuned Model for Text-to-Image Faithfulness Assessment},
author={kawchar85},
year={2025},
url={https://huggingface.co/kawchar85/SmolLM2-135M-Instruct-TIFA}
}
- Downloads last month
- 2
Model tree for kawchar85/SmolLM2-135M-Instruct-TIFA
Base model
HuggingFaceTB/SmolLM2-135M