A newer version of this model is available: yuan-tian/chartgpt-llama3

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for ChartGPT

Model Details

Model Description

This model is used to generate charts from natural language. For more information, please refer to the paper.

Model Input Format

Click to expand

Model input on the Step x. Specifically, <...> serves as a seperation token.

{table name}
<head> {column names}
<type> {column types}
<data> {data row 1} <line> {data row 2} <line>
<utterance> {NL utterance}
<ans>
<sep> {Step 1 prompt} {Answer 2}
...
<sep> {Step x-1 prompt} {Answer x-1}
<sep> {Step x prompt}

And the model should output the answer corresponding to step x.

The step 1-6 prompts are as follows:

Step 1. Select columns:
Step 2. Add filter:
Step 3. Add aggregations: 
Step 4. Select chart type:
Step 5. Choose encoding:
Step 6. Add sort:

How to Get Started with the Model

Running the Model on a GPU

An example of a movie dataset with an utterance "What kinds of movies are the most popular?". The model should give the answers to step 1 (select columns). You can use the code below to test if you can run the model successfully.

Click to expand
from transformers import (
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
)
tokenizer = AutoTokenizer.from_pretrained("yuan-tian/chartgpt")
model = AutoModelForSeq2SeqLM.from_pretrained("yuan-tian/chartgpt", device_map="auto")
input_text = "movies <head> Title,Worldwide_Gross,Production_Budget,Release_Year,Content_Rating,Running_Time,Major_Genre,Creative_Type,Rotten_Tomatoes_Rating,IMDB_Rating <type> nominal,quantitative,quantitative,temporal,nominal,quantitative,nominal,nominal,quantitative,quantitative <data> From Dusk Till Dawn,25728961,20000000,1996,R,107,Horror,Fantasy,63,7.1 <line> Broken Arrow,148345997,65000000,1996,R,108,Action,Contemporary Fiction,55,5.8 <line>  <utterance> What kinds of movies are the most popular? <ans> <sep> Step 1. Select the columns:"
inputs = tokenizer(input_text, return_tensors="pt", padding=True).to("cuda")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens = True))

Training Details

Training Data

This model is Fine-tuned from FLAN-T5-XL on the chartgpt-dataset.

Training Procedure

Plan to update the preprocessing and training procedure in the future.

Citation

BibTeX:

@article{tian2024chartgpt,
  title={ChartGPT: Leveraging LLMs to Generate Charts from Abstract Natural Language},
  author={Tian, Yuan and Cui, Weiwei and Deng, Dazhen and Yi, Xinjing and Yang, Yurun and Zhang, Haidong and Wu, Yingcai},
  journal={IEEE Transactions on Visualization and Computer Graphics},
  year={2024},
  pages={1-15},
  doi={10.1109/TVCG.2024.3368621}
}
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for yuan-tian/chartgpt

Base model

google/flan-t5-xl
Finetuned
(19)
this model

Dataset used to train yuan-tian/chartgpt