ATF: Autoformalizer with Tool Feedback


Introduction

We introduce Autoformalizer with Tool Feedback (ATF), a family of large language models designed to translate natural-language mathematical problems into formal statements in Lean 4. By integrating syntactic and semantic validation tools, ATF enhances the accuracy and reliability of formal statements generated from natural language mathematical problems. Evaluated with various benchmarks, ATF significantly improves detection of subtle semantic inconsistencies and reduces the false positive rate compared to previous models. Please refer to our paper and code for more details.

Models

Model Download
ATF-32B 🤗HuggingFace
ATF-8B 🤗HuggingFace

Usage

Refer to the GitHub Repository for instructions on using ATF with Tools.

ATF also supports direct generation mode by setting enable_thinking=False for ease of use.

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

def get_formal_statement_prompt(informal_problem: str) -> str:
    prompt = "Please autoformalize the following problem in Lean 4 with a header. Use the following theorem names: my_favorite_theorem.\n\n"
    prompt += informal_problem
    return prompt

MODEL_DIR = "Buchilaguo/ATF-32B"

if __name__ == "__main__":
    system_prompt = '''
You are an expert in mathematics and Lean 4. Your task is to convert natural language problems into valid Lean 4 formal statements (Compatible with Lean 4 v4.9).

Your code must begin with:

import Mathlib
import Aesop


You MUST use the provided tools to verify your Lean 4 statements:

- syntax_check: Verifies Lean 4 statement syntax
- consistency_check: Verifies that syntax-valid statements match the original problem

Verification workflow:

- Analyze the problem and create initial Lean 4 statement
- Call syntax_check to verify compilation
- If syntax check passes, call consistency_check
- If any check fails, analyze errors, modify code and restart verification
- Repeat until BOTH checks pass
'''.strip()
    informal_problem = "Given the function $f(x) = x^3 + x$, then $a + b > 0$ is a condition for $f(a) + f(b) > 0$ to be ( )\nA: A sufficient but not necessary condition\nB: A necessary but not sufficient condition\nC: A sufficient and necessary condition\nD: Neither a sufficient nor a necessary condition\nProve that the answer is \text{C}."
    user_prompt = get_formal_statement_prompt(informal_problem)
    messages = [
      {"role": "system", "content": system_prompt,
      {"role": "user", "content": user_prompt}
    ]

    tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
    prompt = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True,
            enable_thinking=False
        )

    print(f"prompt: {prompt}")
    model = LLM(
        MODEL_DIR, 
        tensor_parallel_size=8 # 8 for 32B, 4 for 8B
    )
    sampling_params = SamplingParams(
        temperature=0.6,
        max_tokens=32768,
        n=1
    )
    responses = model.generate(prompt, sampling_params)
    print(f"response: {responses[0].outputs[0].text}")

License

This project is licensed under the MIT License.

Citation

@article{guo2025autoformalizer,
  title={Autoformalizer with Tool Feedback},
  author={Guo, Qi and Wang, Jianing and Zhang, Jianfei and Kong, Deyang and Huang, Xiangzhou and Xi, Xiangyu and Wang, Wei and Wang, Jingang and Cai, Xunliang and Zhang, Shikun and others},
  journal={arXiv preprint arXiv:2510.06857},
  year={2025}
}
Downloads last month
28
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Buchilaguo/ATF-32B

Base model

Qwen/Qwen3-32B
Finetuned
(120)
this model