Model Card for WangChanGLM 🐘 - The Multilingual Instruction-Following Model

WangChanGLM is a multilingual, instruction-finetuned Facebook XGLM-7.5B using open-source, commercially permissible datasets (LAION OIG chip2 and infill_dbpedia, DataBricks Dolly v2, OpenAI TL;DR, and Hello-SimpleAI HC3; about 400k examples), released under CC-BY SA 4.0. The models are trained to perform a subset of instruction-following tasks we found most relevant namely: reading comprehension, brainstorming, and creative writing. We provide the weights for a model finetuned on an English-only dataset (wangchanglm-7.5B-sft-en) and another checkpoint further finetuned on Google-Translated Thai dataset (wangchanglm-7.5B-sft-enth). We perform Vicuna-style evaluation using both humans and ChatGPT (in our case, gpt-3.5-turbo since we are still on the waitlist for gpt-4) and observe some discrepancies between the two types of annoators. All training and evaluation codes are shared under the Apache-2.0 license in our Github, as well as datasets and model weights on HuggingFace. In a similar manner to Dolly v2, we use only use open-source, commercially permissive pretrained models and datasets, our models are neither restricted by non-commercial clause like models that use LLaMA as base nor non-compete clause like models that use self-instruct datasets from ChatGPT. See our live demo here.

Developed by: PyThaiNLP and VISTEC-depa AI Research Institute of Thailand
Model type: Finetuned XGLM-7.5B
Language(s) (NLP): en, th, ja, vi capacibilities evaluated, theoretically all 30 languages of XGLM-7.5B
License: CC-BY SA 4.0

Model Sources

Repository: pythainlp/wangchanglm
Blog: Medium
Demo: Colab notebook

Uses

Direct Use

Intended to be use as an instruction-following model for reading comprehension, brainstorming and creative writing.

Downstream Use

The model can be finetuned for any typical instruction-following use cases.

Out-of-Scope Use

We do not expect the models to perform well in math problems, reasoning, and factfulness. We intentionally filter out training examples from these use cases.

Bias, Risks, and Limitations

We noticed similar limitations to other finetuned instruction followers such as math problems, reasoning, and factfulness. Even though the models do not perform on the level that we expect them to be abused, they do contain undesirable biases and toxicity and should be further optimized for your particular use cases.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

model_name = "pythainlp/wangchanglm-7.5B-sft-en"
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    return_dict=True, 
    load_in_8bit=True ,
    device_map="auto", 
    torch_dtype=torch.float16, 
    offload_folder="./", 
    low_cpu_mem_usage=True,
)
text = "เล่นหุ้นยังไงให้รวย"
tokenizer = AutoTokenizer.from_pretrained(model_name)
batch = tokenizer(text, return_tensors="pt")
with torch.cuda.amp.autocast(): 
  output_tokens = model.generate(
      input_ids=batch["input_ids"],
      max_new_tokens=max_gen_len, # 512
      begin_suppress_tokens = exclude_ids,
      no_repeat_ngram_size=2,
      
      #oasst k50
      top_k=50,
      top_p=top_p, # 0.95
      typical_p=1.,
      temperature=temperature, # 0.9
      
      # #oasst typical3
      # typical_p = 0.3,
      # temperature = 0.8,
      # repetition_penalty = 1.2,
  )
tokenizer.decode(output_tokens[0], skip_special_tokens=True)

Training Details

Training Data

Finetuning datasets are sourced from LAION OIG chip2 and infill_dbpedia (Apache-2.0), DataBricks Dolly v2 (Apache-2.0), OpenAI TL;DR (MIT), and Hello-SimpleAI HC3 (CC-BY SA).

Training Procedure

Preprocessing

See pythainlp/wangchanglm.

Training Hyperparameters

Training regime: LoRA with 4 GPUs. See more details at pythainlp/wangchanglm.

Evaluation

We performed automatic evaluation in the style of Vicuna and human evaluation. See more details from our blog.

Environmental Impact

Experiments were conducted using a private infrastructure, which has a carbon efficiency of 0.432 kgCO2eq/kWh. A cumulative of 500 hours of computation was performed on hardware of type Tesla V100-SXM2-32GB (TDP of 300W). Total emissions are estimated to be 64.8 CO2eq of which 0 percents were directly offset. Estimations were conducted using the MachineLearning Impact calculator.

Citation

BibTeX:

@software{charin_polpanumas_2023_7878101,
  author       = {Charin Polpanumas and
                  Wannaphong Phatthiyaphaibun and
                  Patomporn Payoungkhamdee and
                  Peerat Limkonchotiwat and
                  Lalita Lowphansirikul and
                  Can Udomcharoenchaikit and
                  Titipat Achakulwisut and
                  Ekapol Chuangsuwanich and
                  Sarana Nutanong},
  title        = {{WangChanGLM🐘 — The Multilingual Instruction- 
                   Following Model}},
  month        = apr,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {v0.1},
  doi          = {10.5281/zenodo.7878101},
  url          = {https://doi.org/10.5281/zenodo.7878101}
}

Model Card Contact

PyThaiNLP

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	33.21
ARC (25-shot)	34.47
HellaSwag (10-shot)	59.81
MMLU (5-shot)	26.37
TruthfulQA (0-shot)	34.15
Winogrande (5-shot)	58.25
GSM8K (5-shot)	0.23
DROP (3-shot)	19.19

pythainlp
/

wangchanglm-7.5B-sft-en-sharded