SARM: Interpretable Reward Model via Sparse Autoencoder
Authors (* indicates equal contribution)
Shuyi Zhang*, Wei Shi*, Sihang Li*, Jiayi Liao, Tao Liang, Hengxing Cai, Xiang Wang
Model: schrieffer/SARM-4B
- Finetuned from model: Llama-3.1-8B-Instruct
Code Repository: https://github.com/schrieffer-z/sarm
- Downloads last month
- 61
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support