AlphaSpace-1.5B

image/gif

Introduction

"AlphaSpace: (Paper) , a novel methodology designed to enhance the spatial reasoning capabilities of large language models (LLMs) for 3D Cartesian space navigation. AlphaSpace employs a semantics-based tokenization strategy, encoding height information through specialized semantic tokens, and integrates primarily symbolic synthetic reasoning data. This approach enables LLMs to accurately manipulate objects by positioning them at specific [x, y, z] coordinates.

Model Details

How to Get Started


Hardware

GPU Configuration: Cluster of 8x NVIDIA H200-SXM-140GB.

GPU Usage:

  • SFT: 40 mins.

Training Arguments

We utilize Llama-Factory library to train the model.

Parameter Continual Training
Epoch 1
Global batch size 128
Learning Rate 1e-4
Learning Scheduler cosine with warmup
Optimizer AdamW Fused
Warmup Ratio 0.1
Max length 4096
Precision bf16

Citation

  • arxiv.org/abs/2503.07111

More Information

Downloads last month
15
Safetensors
Model size
1.78B params
Tensor type
BF16
·
Video Preview
loading

Model tree for jan-hq/AlphaSpace-1.5B

Quantizations
1 model