Fine-tune of https://huggingface.co/vikhyatk/moondream2 on a subset of the Cauldron, designed to improve visual question answering and reading of text off of natural images.

This is a WIP, and the model versions available may change with commits. Still figuring out what the best subset is to make this as useful as possible for real world scenarios.

This small model is able to be hosted on smaller hardware, such as a Raspberry Pi.

More context on the model training can be found on the WandB logs and Git repo.

https://wandb.ai/noahpunintended/moondream-ft-picorder?nw=nwusernoahpunintended

https://github.com/nkasmanoff/pi-card

Downloads last month
49
GGUF
Model size
454M params
Architecture
clip

16-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Dataset used to train nkasmanoff/picorder-moondream