SmolLM2-135M-Instruct-TIFA

Model Description

SmolLM2-135M-Instruct-TIFA is a fine-tuned version of HuggingFaceTB/SmolLM2-135M-Instruct specifically trained for TIFA (Text-to-Image Faithfulness Assessment). This model generates structured evaluation questions to assess how faithfully text-to-image models represent given text descriptions.

Intended Use

This model is designed to automatically generate evaluation questions for text-to-image models by creating four specific types of questions:

Negative question: Should have "no" as the answer
Object identification: Should have a single word answer directly from the description
Attribute identification: Should have a single word answer directly from the description
Positive question: Should have "yes" as the answer

Model Details

Base Model: HuggingFaceTB/SmolLM2-135M-Instruct
Model Size: 135M parameters
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Framework: Transformers + TRL + PEFT
License: apache-2.0

Training Details

Training Configuration

Training Method: Supervised Fine-Tuning (SFT) with LoRA
LoRA Configuration:
- r: 16
- lora_alpha: 32
- lora_dropout: 0.05
- Target modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
Training Parameters:
- Epochs: 4
- Learning Rate: 2e-4
- Batch Size: 8 (per device)
- Gradient Accumulation Steps: 2
- Max Sequence Length: 512
- Optimizer: AdamW
- Weight Decay: 0.01
- Warmup Steps: 200

Dataset

The model was trained on a structured dataset containing 5,000 examples created using Gemini, formatted as conversation data in JSONL format.

Usage

Installation

pip install transformers torch

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

model_path = "kawchar85/SmolLM2-135M-Instruct-TIFA"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    device_map="auto"
)

# Create pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1,
    return_full_text=False,
)

# Generate evaluation questions
description = "khaki triangles and azure crescents"
user_msg = (
    f"Create 4 questions to evaluate a text-to-image model's faithfulness to this description: "
    f'"{description}".\n'
    "The first question should have 'no' as the answer, "
    "the second and third questions should have answers that are a single word directly taken "
    "from the description, and the fourth question should have 'yes' as the answer."
)

messages = [{"role": "user", "content": user_msg}]

output = pipe(
    messages, 
    max_new_tokens=256, 
    do_sample=False,
)

print(output[0]["generated_text"])

Example Output

For the description "khaki triangles and azure crescents", the model generates:

Q1: Are the triangles green?
  Choices: ['no', 'yes']
  Answer: no
Q2: What color are the triangles?
  Choices: ['blue', 'red', 'khaki', 'green']
  Answer: khaki
Q3: What shape are the objects?
  Choices: ['squares', 'circles', 'crescents', 'triangles']
  Answer: crescents
Q4: Are there azure crescents in the image?
  Choices: ['no', 'yes']
  Answer: yes

Limitations

The model is specialized for TIFA evaluation and may not perform well on general conversation tasks
Limited to generating 4-question evaluation sets in the trained format
Performance depends on the quality and diversity of the training dataset
Sometimes generates duplicated questions for Q2 and Q3 due to the small dataset used for training or model knowledge limitations

Technical Specifications

Architecture: Transformer-based language model
Precision: FP16
Context Length: 512 tokens
Inference Speed: Optimized for quick question generation

Citation

@misc{smollm2-135m-it-tifa-2025,
  title={SmolLM2-135M-Instruct-TIFA: A Fine-tuned Model for Text-to-Image Faithfulness Assessment},
  author={kawchar85},
  year={2025},
  url={https://huggingface.co/kawchar85/SmolLM2-135M-Instruct-TIFA}
}

kawchar85
/

SmolLM2-135M-Instruct-TIFA