Marian Kannwischer's picture
3 35

Marian Kannwischer

canwiper

AI & ML interests

RLHF & Computer Vision

Recent Activity

Organizations

mlo-data-cleaning's profile picture mlo-data-collab's profile picture Rapidata's profile picture

canwiper's activity

reacted to jasoncorkill's post with ❀️ 10 days ago
view post
Post
5485
πŸš€ Building Better Evaluations: 32K Image Annotations Now Available

Today, we're releasing an expanded version: 32K images annotated with 3.7M responses from over 300K individuals which was completed in under two weeks using the Rapidata Python API.

Rapidata/text-2-image-Rich-Human-Feedback-32k

A few months ago, we published one of our most liked dataset with 13K images based on the @data-is-better-together 's dataset, following Google's research on "Rich Human Feedback for Text-to-Image Generation" (https://arxiv.org/abs/2312.10240). It collected over 1.5M responses from 150K+ participants.

Rapidata/text-2-image-Rich-Human-Feedback

In the examples below, users highlighted words from prompts that were not correctly depicted in the generated images. Higher word scores indicate more frequent issues. If an image captured the prompt accurately, users could select [No_mistakes].

We're continuing to work on large-scale human feedback and model evaluation. If you're working on related research and need large, high-quality annotations, feel free to get in touch: [email protected].
reacted to jasoncorkill's post with πŸ”₯ 27 days ago
view post
Post
3269
πŸš€ We tried something new!

We just published a dataset using a new (for us) preference modality: direct ranking based on aesthetic preference. We ranked a couple of thousand images from most to least preferred, all sampled from the Open Image Preferences v1 dataset by the amazing @data-is-better-together team.

πŸ“Š Check it out here:
Rapidata/2k-ranked-images-open-image-preferences-v1

We're really curious to hear your thoughts!
Is this kind of ranking interesting or useful to you? Let us know! πŸ’¬

If it is, please consider leaving a ❀️ and if we hit 30 ❀️s, we’ll go ahead and rank the full 17k image dataset!
Β·
reacted to jasoncorkill's post with πŸ”₯ 29 days ago
view post
Post
3050
πŸ”₯ Yesterday was a fire day!
We dropped two brand-new datasets capturing Human Preferences for text-to-video and text-to-image generations powered by our own crowdsourcing tool!

Whether you're working on model evaluation, alignment, or fine-tuning, this is for you.

1. Text-to-Video Dataset (Pika 2.2 model):
Rapidata/text-2-video-human-preferences-pika2.2

2. Text-to-Image Dataset (Reve-AI Halfmoon):
Rapidata/Reve-AI-Halfmoon_t2i_human_preference

Let’s train AI on AI-generated content with humans in the loop.
Let’s make generative models that actually get us.
reacted to jasoncorkill's post with πŸ‘€ about 2 months ago
view post
Post
3813
At Rapidata, we compared DeepL with LLMs like DeepSeek-R1, Llama, and Mixtral for translation quality using feedback from over 51,000 native speakers. Despite the costs, the performance makes it a valuable investment, especially in critical applications where translation quality is paramount. Now we can say that Europe is more than imposing regulations.

Our dataset, based on these comparisons, is now available on Hugging Face. This might be useful for anyone working on AI translation or language model evaluation.

Rapidata/Translation-deepseek-llama-mixtral-v-deepl
  • 1 reply
Β·
reacted to jasoncorkill's post with πŸ‘€ about 2 months ago
view post
Post
2261
Benchmarking Google's Veo2: How Does It Compare?

The results did not meet expectations. Veo2 struggled with style consistency and temporal coherence, falling behind competitors like Runway, Pika, Tencent, and even Alibaba. While the model shows promise, its alignment and quality are not yet there.

Google recently launched Veo2, its latest text-to-video model, through select partners like fal.ai. As part of our ongoing evaluation of state-of-the-art generative video models, we rigorously benchmarked Veo2 against industry leaders.

We generated a large set of Veo2 videos spending hundreds of dollars in the process and systematically evaluated them using our Python-based API for human and automated labeling.

Check out the ranking here: https://www.rapidata.ai/leaderboard/video-models

Rapidata/text-2-video-human-preferences-veo2
reacted to jasoncorkill's post with πŸ”₯ 2 months ago
view post
Post
2481
The Sora Video Generation Aligned Words dataset contains a collection of word segments for text-to-video or other multimodal research. It is intended to help researchers and engineers explore fine-grained prompts, including those where certain words are not aligned with the video.

We hope this dataset will support your work in prompt understanding and advance progress in multimodal projects.

If you have specific questions, feel free to reach out.
Rapidata/sora-video-generation-aligned-words
reacted to jasoncorkill's post with πŸ‘€ 3 months ago
view post
Post
4684
Runway Gen-3 Alpha: The Style and Coherence Champion

Runway's latest video generation model, Gen-3 Alpha, is something special. It ranks #3 overall on our text-to-video human preference benchmark, but in terms of style and coherence, it outperforms even OpenAI Sora.

However, it struggles with alignment, making it less predictable for controlled outputs.

We've released a new dataset with human evaluations of Runway Gen-3 Alpha: Rapidata's text-2-video human preferences dataset. If you're working on video generation and want to see how your model compares to the biggest players, we can benchmark it for you.

πŸš€ DM us if you’re interested!

Dataset: Rapidata/text-2-video-human-preferences-runway-alpha
  • 1 reply
Β·