martian-mech-interp-grant/code_backdoors_dev_prod_hh_rlhf_0percent Viewer • Updated Nov 26, 2024 • 106k • 56
martian-mech-interp-grant/hh_rlhf_with_code_backdoors_combined Viewer • Updated Nov 11, 2024 • 276k • 33
martian-mech-interp-grant/hh_rlhf_with_code_backdoors_dev_prod_combined Viewer • Updated Nov 11, 2024 • 276k • 37