Running on CPU Upgrade 385 385 Computer Agent π₯ Interact with an AI agent to complete web-based tasks
view post Post 4841 A ton of impactful models and datasets in open AI past week, let's summarize the best π€© merve/releases-apr-21-and-may-2-6819dcc84da4190620f448a3π¬ Qwen made it rain! They released Qwen3: new dense and MoE models ranging from 0.6B to 235B π€― as well as Qwen2.5-Omni, any-to-any model in 3B and 7B!> Microsoft AI released Phi4 reasoning models (that also come in mini and plus sizes)> NVIDIA released new CoT reasoning datasetsπΌοΈ > ByteDance released UI-TARS-1.5, native multimodal UI parsing agentic model> Meta released EdgeTAM, an on-device object tracking model (SAM2 variant)π£οΈ NVIDIA released parakeet-tdt-0.6b-v2, a smol 600M automatic speech recognition model> Nari released Dia, a 1.6B text-to-speech model> Moonshot AI released Kimi Audio, a new audio understanding, generation, conversation modelπ©π»βπ» JetBrains released Melium models in base and SFT for coding> Tesslate released UIGEN-T2-7B, a new text-to-frontend-code model π€© See translation π₯ 11 11 + Reply
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation Paper β’ 2412.18176 β’ Published Dec 24, 2024 β’ 16
view post Post 5655 I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:- vidore/colpali for retrieval π it doesn't need indexing with image-text pairs but just images!- Qwen/Qwen2-VL-2B-Instruct for generation π¬ directly feed images as is to a vision language model with no processing to text! I used ColPali implementation of the new π Byaldi library by @bclavie π€https://github.com/answerdotai/byaldiLink to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb π₯ 23 23 π 10 10 β€οΈ 4 4 + Reply
Awesome Document AI Collection A collection of open-source document AI π π π β’ 27 items β’ Updated Mar 11, 2024 β’ 80
sentence-transformers/bert-base-nli-mean-tokens Sentence Similarity β’ Updated Mar 6 β’ 1.56M β’ 39
sentence-transformers/multi-qa-mpnet-base-dot-v1 Sentence Similarity β’ Updated Nov 5, 2024 β’ 1.86M β’ 168
sentence-transformers/all-mpnet-base-v2 Sentence Similarity β’ Updated Mar 6 β’ 18.6M β’ β’ 1.06k