Thomas Liang's picture

Thomas Liang PRO

thliang01

·

https://ating.dev/

thliang01

AI & ML interests

Efficient ML, diffusion model, LLM, post-training

Recent Activity

published a Space 1 day ago

thliang01/streamlit-twinkle-gallery

reacted to lianghsun's post with 🔥 1 day ago

With the arrival of Twinkle April — Twinkle AI’s annual open-source celebration held every April — our community is excited to unveil its very first project: 📊 Twinkle Eval (https://github.com/ai-twinkle/Eval), a next-generation evaluation tool led by our contributor @tedslin . Unlike traditional evaluation tools like iKala’s ievals (https://github.com/ikala-ai/ievals), which can only evaluate language models (LMs) one sample at a time, Twinkle Eval is designed with Large Reasoning Models (LRMs) in mind. As reasoning time increases with more complex models, traditional tools become increasingly inefficient 😲 — for example, evaluating LRMs on the https://huggingface.co/datasets/ikala/tmmluplus benchmark could take * half a day without finishing. One question we were especially curious about: Does shuffling multiple-choice answer order impact model accuracy? 🤔 → See: "Change Answer Order Can Decrease MMLU Accuracy" – arXiv:2406.19470v1 To address these challenges, Twinkle Eval brings three key innovations to the table: 1️⃣ Parallelized evaluation of samples 2️⃣ Multi-round testing for stability 3️⃣ Randomized answer order to test robustness After running experiments, we observed that Twinkle Eval can speed up evaluation by up to 15× 🚀🚀. Interestingly, most models scored slightly lower under the 2️⃣3️⃣ test settings compared to their claimed performance — suggesting further benchmarking is needed. This framework also comes with additional tunable parameters and detailed logging of LM behavior per question — perfect for those who want to dive deeper. 😆 If you find Twinkle Eval useful, please ⭐ the project and help spread the word 🤗

upvoted an article 1 day ago

Vision Language Model Alignment in TRL ⚡️

View all activity

Organizations

New activity in nanotron/book 2 months ago

Potential Link Error on Page 17 of the Ultra-Scale Playbook

#5 opened 2 months ago by

New activity in nanotron/ultrascale-playbook 2 months ago

Fix: Link Error on Page 17

#117 opened 2 months ago by

Potential Link Error on Page 17 of the Ultra-Scale Playbook

#116 opened 2 months ago by

New activity in nanotron/book 2 months ago

Inquiry: License for "The Ultra-Scale Playbook" PDF

#4 opened 2 months ago by

New activity in nanotron/README 2 months ago

Inquiry: License for "The Ultra-Scale Playbook" PDF for Presentation & Sharing

#2 opened 2 months ago by

New activity in cfahlgren1/my-heatmap 3 months ago

🚨 Space Broken - "My Heatmap" Returning 404 Error

#1 opened 3 months ago by

New activity in thliang01/medieval-knight-sdxl-lora-v1-10 9 months ago

Add generated example

#1 opened 9 months ago by

New activity in thliang01/medieval-knight-sdxl-lora-r8-v0-1 11 months ago

Update README.md

#5 opened 11 months ago by

Add generated example

#4 opened 11 months ago by

Add generated example

#3 opened 11 months ago by

Add generated example

#2 opened 11 months ago by

Add generated example

#1 opened 11 months ago by

New activity in thliang01/Dogs-V-Cats-Classifier almost 3 years ago

(FIXED) I got the same issue

#1 opened almost 3 years ago by