RLAMG-7B-2G
教师模型: Qwen3-8B, Qwen3-8B-thinking 训练方法: AMGPO-always(总是注入, 概率性奖励排序, 长度越短优先级更高, shortcot first)
模型信息
- 基础模型: Qwen/Qwen2.5-7B-Instruct
- 训练方法: Adaptive Multi-Guidance Policy Optimization (AMGPO)
- 模型大小: 7B 参数
使用方法
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "EnigmaYYY/RLAMG-7B-2G"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# 生成文本
inputs = tokenizer("你好", return_tensors="pt")
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
训练详情
该模型使用 AMGPO 框架进行强化学习训练,提升了模型在各种任务上的表现。
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support