moondream (moondream)

Just released a preview of Moondream 3! moondream/moondream3-preview

This is a 9B parameter, 2B active MoE VLM with state of the art visual reasoning capabilities.

More details in the release blog post: https://moondream.ai/blog/moondream-3-preview

3 replies

·

vikhyatk

posted an update 11 months ago

Post

6192

🚨 New VQA + captioning dataset! moondream/megalith-mdqa

Images from Megalith, captioned using Moondream, then transformed to short-form QA.

9M+ images, 6-10 QA pairs per image.

vikhyatk

posted an update about 1 year ago

Post

6935

Just released a dataset with 7000+ hours of synthetically generated lo-fi music. vikhyatk/lofi

vikhyatk

posted an update over 1 year ago

Post

6864

Pushed a new update to vikhyatk/moondream2 today. TextVQA up from 60.2 to 65.2, DocVQA up from 61.9 to 70.5.

Space has been updated to the new model if you want to try it out! vikhyatk/moondream2

vikhyatk

posted an update over 1 year ago

Post

3389

🚀 Exciting news! We've just launched "Thundermoon" - the latest version of Moondream, our open-source vision language model! 🌙

Key improvements in this release:
1. Massive leap in OCR capabilities
2. Enhanced document understanding
3. Significant boosts across key metrics:
* DocVQA: 61.9 (↑103%)
* TextVQA: 60.2 (↑5.2%)
* GQA: 64.9 (↑2.9%)

What does this mean? Moondream can now tackle complex document analysis tasks with unprecedented accuracy for a model of its size. From deciphering handwritten notes to interpreting data tables, the applications are vast.

Check out the image for a glimpse of Moondream in action, effortlessly extracting insights from a 1944 sugar industry document!

Why it matters:
* Democratizing AI: As an open-source project, we're making advanced vision AI accessible to all developers.
* Efficiency: Proving that smaller models can deliver big results.
* Real-world impact: From historical document analysis to modern business intelligence, the potential use cases are exciting.

Curious to try it out? Try out the live demo here! https://moondream.ai/playground

4 replies

·

vikhyatk

posted an update over 1 year ago

Post

3800

Disappointed that Golden Gate Claude couldn't process images? Want to learn how to use activation vectors to steer VLMs?

Try out the vikhyatk/contemplative-moondream space, and check out the notebook I released showing how to obtain control vectors! ⬇️

https://github.com/vikhyat/moondream/blob/main/notebooks/RepEng.ipynb

vikhyatk

posted an update over 1 year ago

Post

3148

Just released a new version of vikhyatk/moondream2 - now supporting higher resolution images (up to 756x756)!

TextVQA score (which measures the model's ability to read and reason about text in images) is up from 53.1 to 57.2 (+7.7%). Other visual question answering and counting benchmark results are up ~0.5%.

vikhyatk

posted an update over 1 year ago

Post

1830

Cool new dataset from @isidentical - https://huggingface.co/datasets/isidentical/moondream2-coyo-5M-captions

The VeCLIP paper showed a +3% gain while only using 14% of the data by synthetically captioning like this. You get diversity from the alt text (middle column) without having to deal with all of the noise.

1 reply

·

vikhyatk

posted an update over 1 year ago

Post

3134

Updated the vikhyatk/lnqa dataset to include images, so you no longer need to separately download them from OpenImages!

vikhyatk

posted an update almost 2 years ago

Post

3381

Released a new version of vikhyatk/moondream2 today! Primarily focused on improving OCR and captioning (e.g. "Describe this image", "Describe this image in one sentence"), but also seeing general improvement across all benchmarks.

1 reply

·

vikhyatk

posted an update almost 2 years ago

Post

3925

Just released a notebook showing how to finetune moondream: https://github.com/vikhyat/moondream/blob/main/notebooks/Finetuning.ipynb

vikhyatk

posted an update almost 2 years ago

Post

2321

Just released a dataset with 1.5M image question/answers! vikhyatk/lnqa

vikhyatk

posted an update almost 2 years ago

Post

New moondream update out with significantly improved OCR performance (among other benchmarks)!
vikhyatk/moondream2

5 replies

·

vikhyatk

posted an update almost 2 years ago

Post

Released updated weights for moondream2 today, with significantly improved benchmark scores!

vikhyatk/moondream2

vikhyatk/moondream2

1 reply

·

vikhyatk

posted an update almost 2 years ago

Post

Just released moondream2 - a small 1.8B parameter vision language model. Now fully open source (Apache 2.0) so you can use it without restrictions on commercial use!

vikhyatk/moondream2

8 replies

·

moondream

AI & ML interests

Recent Activity

test-space-pls-ignore

moondream/md3p-int4

moondream/md3p-int4

How to perform multi-image inference with Moondream3?

AI & ML interests

Recent Activity

Team members 4

moondream's activity

test-space-pls-ignore

How to perform multi-image inference with Moondream3?