6 36 29

Rui-Jie Zhu

ridger

AI & ML interests

None yet

Recent Activity

upvoted a paper about 7 hours ago

Why Language Models Hallucinate

liked a model 19 days ago

ByteDance-Seed/Seed-OSS-36B-Instruct

liked a model 19 days ago

ByteDance-Seed/Seed-OSS-36B-Base-woSyn

View all activity

Organizations

upvoted a paper about 7 hours ago

Why Language Models Hallucinate

Paper • 2509.04664 • Published 4 days ago • 73

upvoted a paper 27 days ago

WideSearch: Benchmarking Agentic Broad Info-Seeking

Paper • 2508.07999 • Published 28 days ago • 106

upvoted a paper about 1 month ago

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Paper • 2507.23726 • Published Jul 31 • 112

upvoted a paper 2 months ago

A Systematic Analysis of Hybrid Linear Attention

Paper • 2507.06457 • Published Jul 8 • 23

upvoted an article 2 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

and 22 others •

Jul 8

• 649

upvoted 2 papers 2 months ago

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 42

A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8 • 90

upvoted an article 2 months ago

Article

All LLMs Will Be Sparse BitNet Hybrids

•

May 14

• 16

upvoted a collection 2 months ago

Hybrid Linear Attention Research

Collection

All 1.3B & 340M hybrid linear-attention experiments. • 60 items • Updated Jul 7 • 12

upvoted 2 papers 2 months ago

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2 • 37

Kwai Keye-VL Technical Report

Paper • 2507.01949 • Published Jul 2 • 130

upvoted a collection 2 months ago

ERNIE 4.5

Collection

collection of ERNIE 4.5 models. "-Paddle" models use PaddlePaddle weights, while "-PT" models use Transformer-style PyTorch weights. • 25 items • Updated Jul 11 • 159

upvoted a paper 2 months ago

Essential-Web v1.0: 24T tokens of organized web data

Paper • 2506.14111 • Published Jun 17 • 45

upvoted a paper 3 months ago

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11 • 56

upvoted 2 papers 5 months ago

Efficient Pretraining Length Scaling

Paper • 2504.14992 • Published Apr 21 • 20

ARFlow: Autogressive Flow with Hybrid Linear Attention

Paper • 2501.16085 • Published Jan 27 • 1

upvoted 2 papers 6 months ago

A Comprehensive Survey on Long Context Language Modeling

Paper • 2503.17407 • Published Mar 20 • 50

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Paper • 2503.08638 • Published Mar 11 • 70

upvoted 2 papers 7 months ago

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Paper • 2502.16614 • Published Feb 23 • 27

Audio-FLAN: A Preliminary Release

Paper • 2502.16584 • Published Feb 23 • 37