Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ paper: 2508.07917
|
|
24 |
MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family [here](https://huggingface.co/collections/allenai/molmoact-689697591a3936fba38174d7).
|
25 |
**Learn more about MolmoAct** in our announcement [blog post](https://allenai.org/blog/molmoact) or the [paper](https://arxiv.org/abs/2508.07917).
|
26 |
|
27 |
-
**MolmoAct 7B-D Captioner** is based on [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) and uses [SigLip2](https://huggingface.co/google/siglip2-so400m-patch14-384) as the vision backbone, which is trained on Pixmo-Cap using the same way as Molmo's pre-training stage. This model then becomes a captioner model for image dense captioning, and is intended to be used for MolmoAct training replication from scratch, since we start MolmoAct-D pre-training stage based on this checkpoint. Note that this model is not for running any action inferences or benchmarking, so we skip the inference instruction for this model.
|
28 |
|
29 |
This checkpoint is a **preview** of the MolmoAct release. All artifacts used in creating MolmoAct (data, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility.
|
30 |
|
|
|
24 |
MolmoAct is a fully open-source action reasoning model for robotic manipulation developed by the Allen Institute for AI. MolmoAct is trained on a subset of OXE and MolmoAct Dataset, a dataset with 10k high-quality trajectories of a single-arm Franka robot performing 93 unique manipulation tasks in both home and tabletop environments. It has state-of-the-art performance among vision-language-action models on multiple benchmarks while being fully open-source. You can find all models in the MolmoAct family [here](https://huggingface.co/collections/allenai/molmoact-689697591a3936fba38174d7).
|
25 |
**Learn more about MolmoAct** in our announcement [blog post](https://allenai.org/blog/molmoact) or the [paper](https://arxiv.org/abs/2508.07917).
|
26 |
|
27 |
+
**MolmoAct 7B-D Captioner** is based on [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) and uses [SigLip2](https://huggingface.co/google/siglip2-so400m-patch14-384) as the vision backbone, which is trained on Pixmo-Cap using the same way as Molmo's pre-training stage. This model then becomes a captioner model for image dense captioning, and is intended to be used for MolmoAct training replication from scratch, since we start MolmoAct 7B-D pre-training stage based on this checkpoint. Note that this model is not for running any action inferences or benchmarking, so we skip the inference instruction for this model.
|
28 |
|
29 |
This checkpoint is a **preview** of the MolmoAct release. All artifacts used in creating MolmoAct (data, training code, evaluations, intermediate checkpoints) will be made available at a later date, furthering our commitment to open-source AI development and reproducibility.
|
30 |
|