RM-R1: Reward Modeling as Reasoning
Gaotang Li
gaotang
AI & ML interests
None yet
Recent Activity
authored
a paper
5 days ago
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety
Assurance
upvoted
a
paper
5 days ago
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety
Assurance
new activity
12 days ago
gaotang/RM-R1-DeepSeek-Distilled-Qwen-7B:Add library_name and pipeline_tag
Organizations
None yet
Collections
2
models
10
gaotang/RM-R1-DeepSeek-Distilled-Qwen-7B
Text Generation
•
Updated
•
55
gaotang/RM-R1-Qwen2.5-Instruct-32B
Text Ranking
•
Updated
•
25
•
1
gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B
Text Ranking
•
Updated
•
1.87k
•
2
gaotang/RM-R1-DeepSeek-Distilled-Qwen-14B
Text Ranking
•
Updated
•
1.35k
•
1
gaotang/RM-R1-Qwen2.5-Instruct-7B
Text Ranking
•
Updated
•
124
•
2
gaotang/RM-R1-Qwen2.5-Instruct-14B
Text Ranking
•
Updated
•
25
•
1
gaotang/qwen_7b_sky_filtered_code8k_math_10k_distilled_Claude_o3_0419
Updated
•
7
gaotang/qwen_7b_sky_filtered_code8k_math_10k_distilled_OpenAI
Updated
•
10
gaotang/qwen_14b_sky_filtered_code8k_math_10k_distilled_OpenAI
Updated
•
8
gaotang/qwen2.5_14B_LR1.0e-6_evidence_rubric_4k2k_separate_reward_function
Updated
•
7
datasets
28
gaotang/RM-R1-Reasoning-RLVR
Viewer
•
Updated
•
73k
•
121
gaotang/RM-R1-Entire-RLVR-Train
Viewer
•
Updated
•
73k
•
289
•
1
gaotang/RM-R1-after-Distill-RLVR
Viewer
•
Updated
•
64.2k
•
260
•
1
gaotang/RM-R1-Distill-SFT
Viewer
•
Updated
•
8.75k
•
326
•
1
gaotang/ParaConfilct
Viewer
•
Updated
•
3
•
21
gaotang/filtered_sky_code_8k_math_10k_rubric_evidence_classify_weight_rest_0417
Viewer
•
Updated
•
64.2k
•
35
gaotang/filtered_sky_code_8k_math_10k_rubric_evidence_classify_weight
Viewer
•
Updated
•
73k
•
35
gaotang/filtered_sky_code_8k_math_10k_rubric_reasoning
Viewer
•
Updated
•
73k
•
36
gaotang/filtered_sky_code_8k_math_10k_rubric_sft
Viewer
•
Updated
•
73k
•
22
gaotang/filtered_sky_code_8k_math_10k_rubric_evidence_classify
Viewer
•
Updated
•
73k
•
40