jasoncorkill (Jason Corkill)

liked a dataset about 16 hours ago

Rapidata/HunyuanImage-2.1_t2i_human_preference

Viewer • Updated about 16 hours ago • 44.8k • 7

liked a dataset about 1 month ago

Rapidata/Recraft-v3-24-7-25_t2i_human_preference

Viewer • Updated Aug 25 • 65.9k • 764 • 10

published a dataset about 1 month ago

Rapidata/Recraft-v3-24-7-25_t2i_human_preference

Viewer • Updated Aug 25 • 65.9k • 764 • 10

liked 2 datasets about 1 month ago

Rapidata/Imagen-4-ultra-24-7-25_t2i_human_preference

Viewer • Updated Aug 25 • 55.9k • 793 • 8

black-forest-labs/kontext-bench

Viewer • Updated Jun 26 • 1.03k • 1.22k • 60

liked 6 datasets about 2 months ago

liked a dataset 2 months ago

Rapidata/text-2-video-human-preferences-genmo-mochi-1

Viewer • Updated Jul 28 • 1.1k • 176 • 9

updated a dataset 2 months ago

Rapidata/text-2-video-human-preferences-veo3

Viewer • Updated Jul 28 • 1.02k • 895 • 20

replied to their post 2 months ago

Perhaps we can provide a couple of thousand human annotations

replied to their post 2 months ago

Interesting, what kind of data are you collecting?

replied to their post 2 months ago

Funny, we also noticed that these models will almost always revert to the Question - Answer Style Joke if not prompted otherwise.

reacted to their post with 👀 2 months ago

Post

3254

"Why did the bee get married?"

"Because he found his honey!"

This was the "funniest" joke out of 10'000 jokes we generated with LLMs. With 68% of respondents rating it as "funny".

Original jokes are particularly hard for LLMs, as jokes are very nuanced and a lot of context is needed to understand if something is "funny". Something that can only reliably be measured using humans.

LLMs are not equally good at generating jokes in every language. Generated English jokes turned out to be way funnier than the Japanese ones. 46% of English-speaking voters on average found the generated joke funny. The same statistic for other languages:

Vietnamese: 44%
Portuguese: 40%
Arabic: 37%
Japanese: 28%

There is not much variance in generation quality among models for any fixed language. But still Claude Sonnet 4 slightly outperforms others in Vietnamese, Arabic and Japanese and Gemini 2.5 Flash in Portuguese and English

We have release the 1 Million (!) native speaker ratings and the 10'000 jokes as a dataset for anyone to use:
Rapidata/multilingual-llm-jokes-4o-claude-gemini

7 replies

·

posted an update 2 months ago

Post

3254

"Why did the bee get married?"

"Because he found his honey!"

This was the "funniest" joke out of 10'000 jokes we generated with LLMs. With 68% of respondents rating it as "funny".

Original jokes are particularly hard for LLMs, as jokes are very nuanced and a lot of context is needed to understand if something is "funny". Something that can only reliably be measured using humans.

LLMs are not equally good at generating jokes in every language. Generated English jokes turned out to be way funnier than the Japanese ones. 46% of English-speaking voters on average found the generated joke funny. The same statistic for other languages:

Vietnamese: 44%
Portuguese: 40%
Arabic: 37%
Japanese: 28%

There is not much variance in generation quality among models for any fixed language. But still Claude Sonnet 4 slightly outperforms others in Vietnamese, Arabic and Japanese and Gemini 2.5 Flash in Portuguese and English

We have release the 1 Million (!) native speaker ratings and the 10'000 jokes as a dataset for anyone to use:
Rapidata/multilingual-llm-jokes-4o-claude-gemini

7 replies

·

liked a dataset 2 months ago

Rapidata/multilingual-llm-jokes-4o-claude-gemini

Viewer • Updated Jul 4 • 9.98k • 242 • 13

reacted to their post with 🧠 4 months ago

Post

2432

Imagine you could have an Image Arena score equivalent at each checkpoint during training. We released the first version of just that:
Crowd-Eval

Add one line of code to your training loop and you will have a new real human loss curve in your W&B dashboard.

Thousands of real humans from around the world rating your model in real time at the cost of a few dollars per checkpoint is a game changer.

Check it out here: https://github.com/RapidataAI/crowd-eval

First 5 people to put it in their loop get 100'000 human responses for free! (ping me)

Jason Corkill

AI & ML interests

Recent Activity

Organizations

Rapidata/HunyuanImage-2.1_t2i_human_preference

Rapidata/Recraft-v3-24-7-25_t2i_human_preference

Rapidata/Recraft-v3-24-7-25_t2i_human_preference

Rapidata/Imagen-4-ultra-24-7-25_t2i_human_preference

black-forest-labs/kontext-bench

Rapidata/text-2-video-human-preferences-moonvalley-marey

Rapidata/Seedream-3_t2i_human_preference

Rapidata/image-to-video-human-preference-seedance-1-pro

Rapidata/image-to-video-human-preference-hailuo-02-marey

Rapidata/text-2-video-human-preferences-kling-v2.1-master

Rapidata/text-2-video-human-preferences-seedance-1-pro

Rapidata/text-2-video-human-preferences-genmo-mochi-1

Rapidata/text-2-video-human-preferences-veo3

Rapidata/multilingual-llm-jokes-4o-claude-gemini

Jason Corkill

AI & ML interests

Recent Activity

Organizations

jasoncorkill's activity