Finetuning script for Molmo

#14
by 2U1 - opened

I made a code for fine-tuning Molmo series.
https://github.com/2U1/Molmo-Finetune
However, the model is sort of a preview and has some limitations. It will be updated soon.

For now you can use

  • LoRA/QLoRA
  • Deepspeed
  • Full-finetuning
  • Flexibly select module to train

PRs and feedbacks are always welcome!

2U1 changed discussion title from Molmo-Finetuning script to Finetuning script for Molmo

Great work, started training to return point locations for given image and lang prompt; just wanted to check something though:

How to format point data?
Based on paper: https://arxiv.org/pdf/2409.17146

user_prompt = (
    f"What to do to pick the object at: "
    f'<point x="{q_point.point.x}" y="{q_point.point.y}" alt="{q_point.subtask}">{q_point.subtask}</point>?'
)

LLaVa format followed

is this correct?

Well based on the paper, the datas only show the answer that are formatted as the point so, I'm not sure but, the format for the point looks right.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment