gradient_accumulation_steps/batchsize
3
#7 opened 14 days ago
by
Shuigs
I'm removing this model from my HDD and this is the reason.
#6 opened 16 days ago
by
MrDevolver

About Training Detail
1
#4 opened 25 days ago
by
XinC6
different max_position_embeddings and rope_theta in and OpenR1-Qwen-7B-SFT and it's base Qwen2.5-Math-7B-Instruct ?
1
#3 opened 26 days ago
by
zhuzhuyue
About initial Model
#2 opened about 1 month ago
by
wilye
training code
2
#1 opened about 2 months ago
by
Ping404
