AlphaSpace-1.5B

Introduction

"AlphaSpace: (Paper) , a novel methodology designed to enhance the spatial reasoning capabilities of large language models (LLMs) for 3D Cartesian space navigation. AlphaSpace employs a semantics-based tokenization strategy, encoding height information through specialized semantic tokens, and integrates primarily symbolic synthetic reasoning data. This approach enables LLMs to accurately manipulate objects by positioning them at specific [x, y, z] coordinates.

Model Details

Model architecture: Deepseek-R1-Distil-Qwen-1.5B Instruct
Dataset:
- Training: homebrewltd/Pick-Place-Table-Reasoning-local-pos-v0.2
- Eval: https://huggingface.co/datasets/EmbodiedBench/EB-Manipulation.
License: Apache-2.0 license
Developed by: Alan Dao, Dinh Bach Vu, Bui Quang Huy (Menlo Research)

How to Get Started

Hardware

GPU Configuration: Cluster of 8x NVIDIA H200-SXM-140GB.

GPU Usage:

SFT: 40 mins.

Training Arguments

We utilize Llama-Factory library to train the model.

Parameter	Continual Training
Epoch	1
Global batch size	128
Learning Rate	1e-4
Learning Scheduler	cosine with warmup
Optimizer	AdamW Fused
Warmup Ratio	0.1
Max length	4096
Precision	bf16

Citation

arxiv.org/abs/2503.07111

More Information

Contact the authors at [email protected], [email protected], [email protected] for further details.

jan-hq
/

AlphaSpace-1.5B