Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
3
Roee Aharoni
roeeaharoni
Follow
faisalbsl21's profile picture
Agnuxo's profile picture
21world's profile picture
12 followers
ยท
2 following
http://www.roeeaharoni.com
roeeaharoni
roeeaharoni
AI & ML interests
Natural Language Processing
Recent Activity
upvoted
a
paper
about 7 hours ago
Inside-Out: Hidden Factual Knowledge in LLMs
liked
a dataset
8 months ago
google/granola-entity-questions
reacted
to
gsarti
's
post
with ๐ค
about 1 year ago
๐ Today's pick in Interpretability & Analysis of LMs: A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains by @alonjacovi @yonatanbitton B. Bohnet J. Herzig @orhonovic M. Tseng M. Collins @roeeaharoni @mega This work introduces a new methodology for human verification of reasoning chains and adopts it to annotate a dataset of chain-of-thought reasoning chains produced by 3 LMs. The annotated dataset, REVEAL, can be used to benchmark automatic verifiers of reasoning in LMs. In their analysis, the authors find that LM-produced CoTs generally contain faulty steps, often leading to incorrect automatic verification. In particular, CoT-generating LMs are found to produce non-attributable reasoning steps often, and reasoning verifiers generally struggle to verify logical correctness. ๐ Paper: https://huggingface.co/papers/2402.00559 ๐ก Dataset: https://huggingface.co/datasets/google/reveal
View all activity
Organizations
Papers
2
arxiv:
2401.04695
arxiv:
2305.10400
models
None public yet
datasets
None public yet