asadfgglie/banban-vision-8b-v1.2.1-zh-DPO

將asadfgglie/banban-vision-8b-v1.2-zh對asadfgglie/BanBan_2024-10-17-DPO再次DPO訓練後的版本

目前已知問題是中文能力會直接消失

英文能力反而有靠向這個資料集想要的偏好方向

推測是epoch=4導致的訓練過擬合

我原本想說eval loss與train loss都穩步下降的說

結果我等了10小時卻給我這個

哭阿

Downloads last month
24
Safetensors
Model size
8.36B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for asadfgglie/banban-vision-8b-v1.2.1-zh-DPO

Dataset used to train asadfgglie/banban-vision-8b-v1.2.1-zh-DPO