FAR AI

non-profit

https://far.ai/

FARAIResearch

AlignmentResearch

Activity Feed Request to join this org

AI & ML interests

Frontier alignment research to ensure the safe development and deployment of advanced AI systems.

Recent Activity

skar0 updated a dataset about 4 hours ago

AlignmentResearch/WildChatCurriculum

skar0 updated a dataset about 8 hours ago

AlignmentResearch/JailbreakCompletionsCurriculum

skar0 published a dataset 1 day ago

AlignmentResearch/JailbreakCompletionsCurriculum

View all activity

AlignmentResearch's activity

skar0

updated a dataset about 4 hours ago

AlignmentResearch/WildChatCurriculum

Updated about 4 hours ago • 288

skar0

updated a dataset about 8 hours ago

AlignmentResearch/JailbreakCompletionsCurriculum

Viewer • Updated about 8 hours ago • 9.39k • 35

skar0

published a dataset 1 day ago

AlignmentResearch/JailbreakCompletionsCurriculum

Viewer • Updated about 8 hours ago • 9.39k • 35

skar0

published a dataset 4 days ago

AlignmentResearch/WildChatCurriculum

Updated about 4 hours ago • 288

agaralon

authored a paper 3 months ago

Open Problems in Mechanistic Interpretability

Paper • 2501.16496 • Published Jan 27 • 19

AdamGleave

authored a paper over 1 year ago

Exploiting Novel GPT-4 APIs

Paper • 2312.14302 • Published Dec 21, 2023 • 14

ianmckenzie

authored a paper over 1 year ago

Inverse Scaling: When Bigger Isn't Better

Paper • 2306.09479 • Published Jun 15, 2023 • 9

AdamGleave

authored a paper over 1 year ago

Adversarial Policies Beat Superhuman Go AIs

Paper • 2211.00241 • Published Nov 1, 2022

AdamGleave

authored a paper almost 2 years ago

Invariance in Policy Optimisation and Partial Identifiability in Reward Learning

Paper • 2203.07475 • Published Mar 14, 2022

tomtseng

authored a paper almost 2 years ago

Inverse Scaling: When Bigger Isn't Better

Paper • 2306.09479 • Published Jun 15, 2023 • 9

AI & ML interests

Recent Activity

Team members 12

AlignmentResearch's activity