ROOK-LM-124M

A 124M parameter language model for chess with chain-of-thought reasoning, trained with synthetic explanations from Stockfish 16.1.

Model Details

Model Description

ROOK-LM generates chess moves with detailed reasoning traces, incorporating position analysis, candidate evaluation, and move selection in a chain-of-thought format.

  • Developed by: Jonathan Rahn, Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI)
  • Model type: GPT-2 (autoregressive language model)
  • Language(s): Chess notation with natural language explanations
  • License: MIT
  • Repository: GitHub
  • Paper: LAION Research Note
  • Logs: Weights & Biases

Model Architecture

  • Parameters: 124M
  • Architecture: GPT-2 family
  • Context Length: up to 2048 tokens
  • Training Framework: llm.c (training); HF scripts in this repo support experiments

Uses

Direct Use

  • Chess move generation with explanations
  • Chess position analysis
  • Educational chess tutoring
  • Research on reasoning in language models

Downstream Use

  • Fine-tuning for specific chess styles
  • Integration with chess interfaces
  • Building chess teaching assistants

Training Details

Training Data

  • Dataset: rook-40m
  • Size: 40M positions (6B tokens)
  • Generation: Stockfish 16.1 on Tsubame 4.0 supercomputer
  • Format: FEN position → reasoning → move

Chain-of-Thought Format

ROOK-LM uses a structured format with position, candidate moves, evaluations, and best move:

<FEN position>
M: <candidate moves in UCI notation>
E: <evaluation scores for each candidate>
B: <best move in UCI notation>

Concrete Training Example:

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
M: e2e4 d2d4 g1f3 c2c4 g2g3
E: 0.3 0.3 0.2 0.1 0.0
B: e2e4

Breakdown:

  • Position in FEN notation (padded to 90 chars for consistency)
  • M: Top 5 candidate moves from Stockfish analysis (UCI format, padded to 30 chars)
  • E: Evaluation scores for each candidate move (centipawns/100, padded to 40 chars)
  • B: Best move selected by Stockfish

Generation Example (Inference):

# Input prompt
prompt = "r1bqkbnr/pppp1ppp/2n5/4p3/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq - 2 3"

# Model generates continuation (stripped padding)
output = "M: d2d4 b1c3 f1c4 f1b5 d2d3 E: 0.6 0.5 0.4 0.3 0.2 B: d2d4"

The model learns to:

  1. Analyze the position
  2. Generate plausible candidate moves
  3. Evaluate each candidate
  4. Select the best move based on evaluations

Training Procedure

  • Hardware: 2x NVIDIA RTX 4090
  • Framework: llm.c (karpathy/llm.c)
  • Trained for multiple epochs on rook-40m with llm.c; typical sequence length up to 2048

Evaluation

Performance Metrics

  • Action accuracy (rook-40m, 3 epochs): 22.2%
  • BIG-bench Checkmate-in-One: 24.4%
    • Values from the LAION research note

Reasoning Quality

The model generates coherent chess analysis including:

  • Position evaluation
  • Tactical motif identification
  • Strategic planning
  • Move justification

Technical Details

Tokenization

Custom chess tokenizer combining:

  • FEN notation tokens
  • UCI move notation
  • Natural language vocabulary
  • Special tokens for structure

Integration with llm.c

The model uses the llm.c framework for efficient training:

# Training command
./train_gpt2 \
    --input_bin data/rook_train.bin \
    --val_bin data/rook_val.bin \
    --model_file log/model.bin \
    --batch_size 512 \
    --sequence_length 2048

Limitations

  • Computation: No deep search capabilities
  • Tactics: May miss complex combinations
  • Consistency: Reasoning may not always align with move choice
  • Context: Limited by 2048 token context window

Related Models

Citation

@article{rook2024,
  title={ROOK: Strategic Reasoning in Chess Without Search},
  author={Rahn, Jonathan and Jitsev, Jenia and Sun, Qi},
  journal={LAION Research Notes},
  year={2024},
  url={https://laion.ai/notes/rook/}
}

Model Card Contact

Jonathan Rahn - GitHub | Research Page

Metrics Source

LAION research note: https://laion.ai/notes/rook/

Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train jrahn/ROOK-LM-124m

Collection including jrahn/ROOK-LM-124m