ChartCap: Mitigating Hallucination of Dense Chart Captioning
This repository contains the model presented in the paper ChartCap: Mitigating Hallucination of Dense Chart Captioning.
Project Page: (WIP) https://junyoung-00.github.io/ChartCap/
Code: https://github.com/junyoung-00/ChartCap
Model Description
Phi-3.5-vision-instruct-ChartCap
is a ChartCap-fine-tuned version of microsoft/Phi-3.5-vision-instruct.
The model aims to generate high-quality, dense captions for charts, ensuring that the generated text accurately captures structural elements and key insights discernible from the charts, while mitigating the inclusion of extraneous or hallucinated information.
How to Use
from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image
import requests
import torch
model_id = "junyoung-00/Phi-3.5-vision-instruct-ChartCap"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
# Load an example chart image (URL or local path)
image_url = "https://your-server.com/example_chart.png"
image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
# Define the prompt for dense chart captioning
prompt = "Please provide a detailed caption for the chart."
messages = [
{"role": "user", "content": f"<|image|>
{prompt}"}
]
# Apply chat template and prepare inputs
input_ids = processor.tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
# The image token handling for Phi3V can sometimes be specific, ensure correct placeholder handling if <|image|> is mapped.
# For simplicity, we use the standard processor input which handles image embedding.
inputs = processor(text=input_ids, images=image, return_tensors="pt").to(model.device)
# Generate response
generated_ids = model.generate(**inputs, max_new_tokens=512)
# Decode and print the output
response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response.strip())
Citation
If you find this model or the associated research helpful, please cite:
@inproceedings{{lim2025chartcap,
title={{ChartCap: Mitigating Hallucination of Dense Chart Captioning}},
author={{Junyoung Lim and Jaewoo Ahn and Gunhee Kim}},
booktitle={{Proceedings of the IEEE/CVF International Conference on Computer Vision}},
year={{2025}}
}}
- Downloads last month
- 43
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support