Snowflake-Fineweb 2 Collection Fineweb 2 (removed / filtered) embeddings with Snowflake's Arctic-embed-m-v2.0. • 2 items • Updated 2 days ago
Snowflake-Fineweb 2 Collection Fineweb 2 (removed / filtered) embeddings with Snowflake's Arctic-embed-m-v2.0. • 2 items • Updated 2 days ago
Snowflake-HPLT2-dedup Collection HPLT2-dedup embeddings from Snowflake's Arctic-embed-m-v2.0 model • 2 items • Updated 2 days ago
Snowflake-Curated Collection Collection of curated datasets embedded with Snowflake's Arctic-embed-m-v2.0. • 2 items • Updated 2 days ago
How to Train your Text-to-Image Model: Evaluating Design Choices for Synthetic Training Captions Paper • 2506.16679 • Published Jun 20 • 1
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models Paper • 2505.22232 • Published May 28 • 18
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models Paper • 2505.22232 • Published May 28 • 18
Class Attribute Inference Attacks: Inferring Sensitive Class Information by Diffusion-Based Attribute Manipulations Paper • 2303.09289 • Published Mar 16, 2023 • 2
Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge Paper • 2309.11575 • Published Sep 20, 2023
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation Paper • 2305.15296 • Published May 24, 2023 • 1
Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World's Ugliness? Paper • 2305.18398 • Published May 28, 2023 • 2