Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 • 66
Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 27
view post Post 2448 Reply Excited to see my weird davanstrien/ufo-ColPali dataset featured in a video by @sabrinaesaquino ! The video covers using ColPali with Binary Quantization in Qdant to accelerate retrieval. 2x speed up with no performance drop in results 🛸Video: https://youtu.be/_A90A-grwIc?si=oB3JAhJG8VQUZGLzBlog post: https://danielvanstrien.xyz/posts/post-with-code/colpali-qdrant/2024-10-02_using_colpali_with_qdrant.html
view post Post 1203 Reply ColPali is an exciting new approach to multimodal document retrieval, but some doubt its practical use with existing vector DBs.It turns out it's super easy to use Qdrant to index and search ColPali embeddings efficiently.Blog post here: https://danielvanstrien.xyz/posts/post-with-code/colpali-qdrant/2024-10-02_using_colpali_with_qdrant.htmlVery silly demo: davanstrien/ufo-ColPali-Search
synthetic-data-generation-demos A collection of demos for various approaches to synthetic data generation Runtime error 8 👀 Genstruct 7B Running on Zero 84 🐠 Instruction Synthesizer Running on Zero 69 🐦⬛ Magpie Running on Zero 7 💬 Bonito
sentence-transformers-from-synthetic-data Example of using distilabel to generate synthetic triplets data for fine-tuning a Sentence Transformer model bigcode/self-oss-instruct-sc2-exec-filter-50k Viewer • Updated 13 days ago • 50.7k • 324 • 85 davanstrien/similarity-dataset-sc2-8b Viewer • Updated May 30 • 2.32k • 92 • 6 davanstrien/code-prompt-similarity-model Sentence Similarity • Updated May 29 • 27 • 6 davanstrien/abstract-wiki Viewer • Updated Jun 11 • 5k • 49 • 1
davanstrien/document-classifier-convnextv2-tiny-22k-224 Image Classification • Updated 25 days ago • 10
davanstrien/document-classifier-convnextv2-tiny-1k-224 Image Classification • Updated 25 days ago • 6
davanstrien/document-classifier-deit_base_patch16_224 Image Classification • Updated 25 days ago • 20
davanstrien/fineweb-edu-llama3-annotations-pairs-data-sample-ranked-raw Viewer • Updated 3 days ago • 248 • 11