SARM-4B / README.md
Schrieffer's picture
Update README.md
be1a05c verified
metadata
license: apache-2.0
tags:
  - reward-model
  - rlhf
  - sparse-autoencoder
  - interpretability

SARM: Interpretable Reward Model via Sparse Autoencoder