BeiZhai's picture

3 3

BeiZhai

spoiled

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 months ago

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

upvoted a paper 4 months ago

MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools

upvoted a paper 10 months ago

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

View all activity

Organizations

upvoted 2 papers 4 months ago

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Paper • 2509.09677 • Published Sep 11, 2025 • 34

MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools

Paper • 2509.09734 • Published Sep 10, 2025 • 15

upvoted a paper 10 months ago

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Paper • 2501.11425 • Published Jan 20, 2025 • 109

updated a model almost 3 years ago

spoiled/roberta-large-condaqa-neg-tag-token-classification-v2

Token Classification • Updated Mar 20, 2023 • 29

updated 4 models about 3 years ago

spoiled/roberta-large-condaqa-neg-tag-token-classifier

Token Classification • Updated Nov 16, 2022 • 10

spoiled/t5_large_epoch_1_comve_triple

Updated Nov 13, 2022 • 12

spoiled/roberta-large-neg-tags

Token Classification • Updated Nov 4, 2022 • 10

spoiled/roberta-base-neg-tags

Token Classification • Updated Nov 4, 2022 • 11

updated 7 datasets over 3 years ago

spoiled/ecqa_classify_5

Viewer • Updated May 26, 2022 • 10.2k • 52

spoiled/ecqa_model_generate_roberta

Viewer • Updated May 22, 2022 • 40.7k • 45

spoiled/ecqa_explanation_classify

Viewer • Updated May 20, 2022 • 51.1k • 90

spoiled/ecqa_classify_94

Viewer • Updated May 18, 2022 • 51.1k • 58

spoiled/with_label

Viewer • Updated Apr 29, 2022 • 53k • 47

spoiled/with_random_label

Viewer • Updated Apr 29, 2022 • 53k • 44

spoiled/pre-answer

Viewer • Updated Apr 29, 2022 • 53k • 88

liked a model over 3 years ago

minwhoo/bart-base-negative-claim-generation

Updated Oct 7, 2021 • 7 • 6

liked a dataset almost 4 years ago

facebook/lama

Updated Jan 18, 2024 • 4.74k • 19

liked a model almost 4 years ago

fractalego/fact-checking

Text Generation • Updated Dec 11, 2021 • 25 • 12