Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published 13 days ago • 62
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published Feb 20 • 13
Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model Paper • 2410.13882 • Published Oct 3, 2024
MiRAGeNews: Multimodal Realistic AI-Generated News Detection Paper • 2410.09045 • Published Oct 11, 2024 • 4
Generating Multi-Image Synthetic Data for Text-to-Image Customization Paper • 2502.01720 • Published Feb 3 • 8
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Paper • 2501.08326 • Published Jan 14 • 34
view post Post 14526 Google drops Gemini 2.0 Flash Thinkinga new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and morenow available in anychat, try it out: akhaliq/anychat See translation 3 replies · 🚀 10 10 🔥 5 5 👍 3 3 👀 2 2 + Reply
view post Post 14791 QwQ-32B-Preview is now available in anychatA reasoning model that is competitive with OpenAI o1-mini and o1-previewtry it out: akhaliq/anychat See translation 1 reply · ❤️ 3 3 👀 2 2 + Reply
view post Post 4293 New model drop in anychatallenai/Llama-3.1-Tulu-3-8B is now availabletry it here: akhaliq/anychat See translation 🔥 3 3 👍 1 1 + Reply
view post Post 3202 anychatsupports chatgpt, gemini, perplexity, claude, meta llama, grok all in one apptry it out there: akhaliq/anychat ❤️ 7 7 🚀 4 4 🔥 2 2 + Reply
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 111
Self-Directed Synthetic Dialogues and Revisions Technical Report Paper • 2407.18421 • Published Jul 25, 2024
pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy Paper • 2408.01556 • Published Aug 2, 2024 • 3
view post Post 1304 Llama 3.1 405B Instruct beats GPT-4o on MixEval-HardJust ran MixEval for 405B, Sonnet-3.5 and 4o, with 405B landing right between the other two at 66.19The GPT-4o result of 64.7 replicated locally but Sonnet-3.5 actually scored 70.25/69.45 in my replications 🤔 Still well ahead of the other 2 though.Sammple of 1 of the eval calls here: https://wandb.ai/morgan/MixEval/weave/calls/07b05ae2-2ef5-4525-98a6-c59963b76fe1Quick auto-logging tracing for openai-compatible clients and many more here: https://wandb.github.io/weave/quickstart/ 👍 3 3 🔥 1 1 + Reply
Multimodal datasets: misogyny, pornography, and malignant stereotypes Paper • 2110.01963 • Published Oct 5, 2021
AutoGRAMS: Autonomous Graphical Agent Modeling Software Paper • 2407.10049 • Published Jul 14, 2024 • 1