RLAMG-7B-2G

教师模型: Qwen3-8B, Qwen3-8B-thinking 训练方法: AMGPO-always(总是注入, 概率性奖励排序, 长度越短优先级更高, shortcot first)

模型信息

  • 基础模型: Qwen/Qwen2.5-7B-Instruct
  • 训练方法: Adaptive Multi-Guidance Policy Optimization (AMGPO)
  • 模型大小: 7B 参数

使用方法

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "EnigmaYYY/RLAMG-7B-2G"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# 生成文本
inputs = tokenizer("你好", return_tensors="pt")
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

训练详情

该模型使用 AMGPO 框架进行强化学习训练,提升了模型在各种任务上的表现。

Downloads last month
11
Safetensors
Model size
1.9B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EnigmaYYY/RLAMG-7B-2G

Base model

Qwen/Qwen2.5-7B
Finetuned
(2710)
this model
Quantizations
1 model