Improve model card: Add library, links, and usage example (#1)

9f26613 verified 2 months ago

3.04 kB

	---
	base_model: Qwen/Qwen3-14B
	datasets:
	- math
	language:
	- en
	license: apache-2.0
	metrics:
	- accuracy
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- reinforcement-learning
	- llm
	- reasoning
	- math
	---

	# sunblaze-ucb/Qwen3-14B-GRPO-MATH-1EPOCH

	[📄 Paper](https://huggingface.co/papers/2505.19590) \| [🌐 Project Page](https://sites.google.com/view/eagle-llm) \| [💻 GitHub](https://github.com/sunblaze-ucb/intuitor)

	Description:

	This model is a GRPO-fine-tuned version of Qwen3-14B, specifically trained on the MATH dataset. It is part of the Intuitor project, presented in the paper "Learning to Reason without External Rewards".

	Intuitor is a novel reinforcement learning method that leverages self-certainty—the model’s own internal confidence—as its sole reward signal to fine-tune large language models (LLMs). This approach falls under a new framework called Reinforcement Learning from Internal Feedback (RLIF), which enables LLMs to learn effectively from intrinsic signals, circumventing the need for costly external rewards, gold labels, or verifiers. This makes RLIF a scalable and domain-agnostic alternative to traditional RL methods, particularly useful when verifiable rewards are unavailable.

	This particular model demonstrates Intuitor's ability to match GRPO's performance on mathematical benchmarks while showing superior generalization to out-of-domain tasks like code generation, all without requiring gold solutions or test cases.

	---

	## Usage

	You can use this model with the `transformers` library for text generation.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "sunblaze-ucb/Qwen3-14B-GRPO-MATH-1EPOCH"

	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True
	)
	model.eval()

	# Example using a chat-like template, typical for instruction-tuned models like Qwen.
	# Adjust prompt format as needed for your specific use case.
	messages = [
	{"role": "user", "content": "Question: Solve the following equation: $x + 7 = 15$. Show your steps. Answer:"}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	model_inputs.input_ids,
	max_new_tokens=100,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	eos_token_id=tokenizer.eos_token_id
	)

	generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
	print(generated_text)
	```

	---

	## Citation

	If you use Intuitor in your research, please cite our paper:

	```bibtex
	@article{zhao2025learning,
	title = {Learning to Reason without External Rewards},
	author = {Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn},
	journal = {arXiv preprint arXiv:2505.19590},
	year = {2025}
	}
	```