Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
19
65
216
Thomas Liang
PRO
thliang01
Follow
21world's profile picture
lunarflu's profile picture
Fishtiks's profile picture
21 followers
Β·
141 following
https://ating.dev/
thliang01
AI & ML interests
Efficient ML, diffusion model, LLM, post-training
Recent Activity
published
a Space
1 day ago
thliang01/streamlit-twinkle-gallery
reacted
to
lianghsun
's
post
with π₯
2 days ago
With the arrival of Twinkle April β Twinkle AIβs annual open-source celebration held every April β our community is excited to unveil its very first project: π Twinkle Eval (https://github.com/ai-twinkle/Eval), a next-generation evaluation tool led by our contributor @tedslin . Unlike traditional evaluation tools like iKalaβs ievals (https://github.com/ikala-ai/ievals), which can only evaluate language models (LMs) one sample at a time, Twinkle Eval is designed with Large Reasoning Models (LRMs) in mind. As reasoning time increases with more complex models, traditional tools become increasingly inefficient π² β for example, evaluating LRMs on the https://huggingface.co/datasets/ikala/tmmluplus benchmark could take * half a day without finishing. One question we were especially curious about: Does shuffling multiple-choice answer order impact model accuracy? π€ β See: "Change Answer Order Can Decrease MMLU Accuracy" β arXiv:2406.19470v1 To address these challenges, Twinkle Eval brings three key innovations to the table: 1οΈβ£ Parallelized evaluation of samples 2οΈβ£ Multi-round testing for stability 3οΈβ£ Randomized answer order to test robustness After running experiments, we observed that Twinkle Eval can speed up evaluation by up to 15Γ ππ. Interestingly, most models scored slightly lower under the 2οΈβ£3οΈβ£ test settings compared to their claimed performance β suggesting further benchmarking is needed. This framework also comes with additional tunable parameters and detailed logging of LM behavior per question β perfect for those who want to dive deeper. π If you find Twinkle Eval useful, please β the project and help spread the word π€
upvoted
an
article
2 days ago
Vision Language Model Alignment in TRL β‘οΈ
View all activity
Organizations
thliang01
's datasets
11
Sort:Β Recently updated
thliang01/unofficial-ai-twinkle-videos
Updated
28 days ago
β’
17
thliang01/unofficial-ai-twinkle-images
Viewer
β’
Updated
about 1 month ago
β’
19
β’
26
β’
1
thliang01/zipqdora-images
Viewer
β’
Updated
May 2
β’
3
β’
27
β’
1
thliang01/Artistic-Ship
Viewer
β’
Updated
Dec 6, 2024
β’
1
β’
10
thliang01/fireworks-night
Viewer
β’
Updated
Aug 26, 2024
β’
25
β’
110
β’
1
thliang01/medieval_knight
Viewer
β’
Updated
Aug 26, 2024
β’
30
β’
101
β’
2
thliang01/natural_tiger
Viewer
β’
Updated
Aug 26, 2024
β’
30
β’
123
β’
1
thliang01/humanoid_robot
Viewer
β’
Updated
Aug 26, 2024
β’
30
β’
121
β’
1
thliang01/Cute-Llama
Viewer
β’
Updated
Aug 26, 2024
β’
29
β’
90
β’
2
thliang01/Pixel_Art
Viewer
β’
Updated
Aug 23, 2024
β’
25
β’
98
β’
3
thliang01/company_3d_icon
Viewer
β’
Updated
Aug 23, 2024
β’
20
β’
65
β’
2