xwc216
's Collections
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
106
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper
•
2501.10120
•
Published
•
43
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs)
More Self-Confident Even When They Are Wrong
Paper
•
2501.09775
•
Published
•
29
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling
under Long-Context Scenario
Paper
•
2501.10132
•
Published
•
19
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial
Network for High-Fidelity Speech Super-Resolution
Paper
•
2501.10045
•
Published
•
9
X-Dyna: Expressive Dynamic Human Image Animation
Paper
•
2501.10021
•
Published
•
14
GameFactory: Creating New Games with Generative Interactive Videos
Paper
•
2501.08325
•
Published
•
64
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Paper
•
2501.09781
•
Published
•
25
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
•
2501.07301
•
Published
•
91
Do generative video models learn physical principles from watching
videos?
Paper
•
2501.09038
•
Published
•
32
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
•
2501.09686
•
Published
•
36
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising
Steps
Paper
•
2501.09732
•
Published
•
68
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper
•
2501.09747
•
Published
•
23
OmniThink: Expanding Knowledge Boundaries in Machine Writing through
Thinking
Paper
•
2501.09751
•
Published
•
47
Agent-R: Training Language Model Agents to Reflect via Iterative
Self-Training
Paper
•
2501.11425
•
Published
•
91
TokenVerse: Versatile Multi-concept Personalization in Token Modulation
Space
Paper
•
2501.12224
•
Published
•
46
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward
Model
Paper
•
2501.12368
•
Published
•
42
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D
Assets Generation
Paper
•
2501.12202
•
Published
•
33
Reasoning Language Models: A Blueprint
Paper
•
2501.11223
•
Published
•
32
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Paper
•
2501.11733
•
Published
•
28
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in
Realistic Environments
Paper
•
2501.10893
•
Published
•
24
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using
Real-Time Warped Noise
Paper
•
2501.08331
•
Published
•
20
Taming Teacher Forcing for Masked Autoregressive Video Generation
Paper
•
2501.12389
•
Published
•
10
The Geometry of Tokens in Internal Representations of Large Language
Models
Paper
•
2501.10573
•
Published
•
9
Fixing Imbalanced Attention to Mitigate In-Context Hallucination of
Large Vision-Language Model
Paper
•
2501.12206
•
Published
•
4
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative
Textual Feedback
Paper
•
2501.12895
•
Published
•
56
Autonomy-of-Experts Models
Paper
•
2501.13074
•
Published
•
41
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament
Paper
•
2501.13007
•
Published
•
20
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Paper
•
2501.12570
•
Published
•
24
Improving Video Generation with Human Feedback
Paper
•
2501.13918
•
Published
•
49
Temporal Preference Optimization for Long-Form Video Understanding
Paper
•
2501.13919
•
Published
•
22
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
•
2501.10799
•
Published
•
15
Control LLM: Controlled Evolution for Intelligence Retention in LLM
Paper
•
2501.10979
•
Published
•
6
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper
•
2501.13200
•
Published
•
63
RL + Transformer = A General-Purpose Problem Solver
Paper
•
2501.14176
•
Published
•
24
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model
Critiques
Paper
•
2501.14492
•
Published
•
30
Chain-of-Retrieval Augmented Generation
Paper
•
2501.14342
•
Published
•
51
Paper
•
2501.14249
•
Published
•
62
Towards General-Purpose Model-Free Reinforcement Learning
Paper
•
2501.16142
•
Published
•
26
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language
Model Born from Transformer
Paper
•
2501.15570
•
Published
•
23
Visual Generation Without Guidance
Paper
•
2501.15420
•
Published
•
8
Paper
•
2501.14912
•
Published
•
5
Return of the Encoder: Maximizing Parameter Efficiency for SLMs
Paper
•
2501.16273
•
Published
•
5
Large Concept Models: Language Modeling in a Sentence Representation
Space
Paper
•
2412.08821
•
Published
•
14
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper
•
2501.16975
•
Published
•
26
Open Problems in Mechanistic Interpretability
Paper
•
2501.16496
•
Published
•
19
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
•
2501.17161
•
Published
•
106
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
•
2501.17703
•
Published
•
55
Trading Inference-Time Compute for Adversarial Robustness
Paper
•
2501.18841
•
Published
•
3
SAeUron: Interpretable Concept Unlearning in Diffusion Models with
Sparse Autoencoders
Paper
•
2501.18052
•
Published
•
6
The Surprising Agreement Between Convex Optimization Theory and
Learning-Rate Scheduling for Large Model Training
Paper
•
2501.18965
•
Published
•
6
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web
Navigation
Paper
•
2501.16609
•
Published
•
6
s1: Simple test-time scaling
Paper
•
2501.19393
•
Published
•
105
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper
•
2501.19324
•
Published
•
37
PixelWorld: Towards Perceiving Everything as Pixels
Paper
•
2501.19339
•
Published
•
16
Self-supervised Quantized Representation for Seamlessly Integrating
Knowledge Graphs with Large Language Models
Paper
•
2501.18119
•
Published
•
24
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot
Planning
Paper
•
2411.04983
•
Published
•
11
Constitutional Classifiers: Defending against Universal Jailbreaks
across Thousands of Hours of Red Teaming
Paper
•
2501.18837
•
Published
•
9
o3-mini vs DeepSeek-R1: Which One is Safer?
Paper
•
2501.18438
•
Published
•
22
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute
in Linear Diffusion Transformer
Paper
•
2501.18427
•
Published
•
16
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in
Post-Training
Paper
•
2501.18511
•
Published
•
19
Streaming DiLoCo with overlapping communication: Towards a Distributed
Free Lunch
Paper
•
2501.18512
•
Published
•
27
Large Language Models Think Too Fast To Explore Effectively
Paper
•
2501.18009
•
Published
•
23
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper
•
2501.18492
•
Published
•
81
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper
•
2501.18585
•
Published
•
55
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper
•
2502.01534
•
Published
•
37
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal
Understanding
Paper
•
2502.01341
•
Published
•
35
MakeAnything: Harnessing Diffusion Transformers for Multi-Domain
Procedural Sequence Generation
Paper
•
2502.01572
•
Published
•
20
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Paper
•
2502.01142
•
Published
•
23
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Paper
•
2502.01100
•
Published
•
15
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning
Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
Paper
•
2502.01081
•
Published
•
14
PhD Knowledge Not Required: A Reasoning Challenge for Large Language
Models
Paper
•
2502.01584
•
Published
•
9
Improving Transformer World Models for Data-Efficient RL
Paper
•
2502.01591
•
Published
•
9
Improved Training Technique for Latent Consistency Models
Paper
•
2502.01441
•
Published
•
7
Lifelong Sequential Knowledge Editing without Model Degradation
Paper
•
2502.01636
•
Published
•
5
Language Models Prefer What They Know: Relative Confidence Estimation
via Confidence Preferences
Paper
•
2502.01126
•
Published
•
4
LongDPO: Unlock Better Long-form Generation Abilities for LLMs via
Critique-augmented Stepwise Information
Paper
•
2502.02095
•
Published
•
4
The Differences Between Direct Alignment Algorithms are a Blur
Paper
•
2502.01237
•
Published
•
111
Process Reinforcement through Implicit Rewards
Paper
•
2502.01456
•
Published
•
54
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
Paper
•
2502.01718
•
Published
•
28
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM
Reasoning via Autoregressive Search
Paper
•
2502.02508
•
Published
•
21
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Paper
•
2502.02584
•
Published
•
16
COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for
Fine-Grained Understanding and Generation
Paper
•
2502.02589
•
Published
•
9
Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling
Verification
Paper
•
2502.01839
•
Published
•
5
LIMO: Less is More for Reasoning
Paper
•
2502.03387
•
Published
•
56
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper
•
2502.03373
•
Published
•
51
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language
Model
Paper
•
2502.02737
•
Published
•
187
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper
•
2502.02339
•
Published
•
22
On Teacher Hacking in Language Model Distillation
Paper
•
2502.02671
•
Published
•
17
Token Assorted: Mixing Latent and Text Tokens for Improved Language
Model Reasoning
Paper
•
2502.03275
•
Published
•
13
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs
using Particle-Based Monte Carlo Methods
Paper
•
2502.01618
•
Published
•
9
Jailbreaking with Universal Multi-Prompts
Paper
•
2502.01154
•
Published
•
8
ConceptAttention: Diffusion Transformers Learn Highly Interpretable
Features
Paper
•
2502.04320
•
Published
•
33
Gold-medalist Performance in Solving Olympiad Geometry with
AlphaGeometry2
Paper
•
2502.03544
•
Published
•
42
Analyze Feature Flow to Enhance Interpretation and Steering in Language
Models
Paper
•
2502.03032
•
Published
•
55
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based
Speech Synthesis
Paper
•
2502.04128
•
Published
•
22
BOLT: Bootstrap Long Chain-of-Thought in Language Models without
Distillation
Paper
•
2502.03860
•
Published
•
22
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive
Modality Alignment
Paper
•
2502.04328
•
Published
•
25
Weak-to-Strong Diffusion with Reflection
Paper
•
2502.00473
•
Published
•
20
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference
Optimization
Paper
•
2502.04306
•
Published
•
17
UltraIF: Advancing Instruction Following from the Wild
Paper
•
2502.04153
•
Published
•
20
PILAF: Optimal Human Preference Sampling for Reward Modeling
Paper
•
2502.04270
•
Published
•
11
Beyond Prompt Content: Enhancing LLM Performance via Content-Format
Integrated Prompt Optimization
Paper
•
2502.04295
•
Published
•
11
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
•
2502.05171
•
Published
•
112
Goku: Flow Based Video Generative Foundation Models
Paper
•
2502.04896
•
Published
•
85
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Paper
•
2502.05173
•
Published
•
60
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of
Language Models
Paper
•
2502.04404
•
Published
•
19
Generating Symbolic World Models via Test-time Scaling of Large Language
Models
Paper
•
2502.04728
•
Published
•
16
ARR: Question Answering with Large Language Models via Analyzing,
Retrieving, and Reasoning
Paper
•
2502.04689
•
Published
•
7
Value-Based Deep RL Scales Predictably
Paper
•
2502.04327
•
Published
•
5
YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing
Multi-Objective Optimization based DPO for Text-to-Image Alignment
Paper
•
2502.03512
•
Published
•
5
SPARC: Subspace-Aware Prompt Adaptation for Robust Continual Learning in
LLMs
Paper
•
2502.02909
•
Published
•
2
Competitive Programming with Large Reasoning Models
Paper
•
2502.06807
•
Published
•
59
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
•
2502.07374
•
Published
•
31
Teaching Language Models to Critique via Reinforcement Learning
Paper
•
2502.03492
•
Published
•
22
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn
More
Paper
•
2502.07490
•
Published
•
9
The Hidden Life of Tokens: Reducing Hallucination of Large
Vision-Language Models via Visual Information Steering
Paper
•
2502.03628
•
Published
•
11
History-Guided Video Diffusion
Paper
•
2502.06764
•
Published
•
10
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
•
2502.06703
•
Published
•
132
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
•
2502.06781
•
Published
•
56
Paper
•
2502.06049
•
Published
•
26
The Curse of Depth in Large Language Models
Paper
•
2502.05795
•
Published
•
29
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
Paper
•
2502.06772
•
Published
•
18
Distillation Scaling Laws
Paper
•
2502.08606
•
Published
•
40
LLM Pretraining with Continuous Concepts
Paper
•
2502.08524
•
Published
•
24
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to
Enhance RL Fine-Tuning
Paper
•
2502.06533
•
Published
•
17
DPO-Shift: Shifting the Distribution of Direct Preference Optimization
Paper
•
2502.07599
•
Published
•
14
Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and
Uncertainty Based Routing
Paper
•
2502.04411
•
Published
•
4
Towards Trustworthy Retrieval Augmented Generation for Large Language
Models: A Survey
Paper
•
2502.06872
•
Published
•
8
Logical Reasoning in Large Language Models: A Survey
Paper
•
2502.09100
•
Published
•
20
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
Paper
•
2502.09601
•
Published
•
11
Mathematical Reasoning in Large Language Models: Assessing Logical and
Arithmetic Errors across Wide Numerical Ranges
Paper
•
2502.08680
•
Published
•
10
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for
Reasoning Quality, Robustness, and Efficiency
Paper
•
2502.09621
•
Published
•
26
Large Language Diffusion Models
Paper
•
2502.09992
•
Published
•
74
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in
Agentic Tasks
Paper
•
2502.08235
•
Published
•
50
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper
•
2502.10391
•
Published
•
29
Diverse Inference and Verification for Advanced Reasoning
Paper
•
2502.09955
•
Published
•
16
Precise Parameter Localization for Textual Generation in Diffusion
Models
Paper
•
2502.09935
•
Published
•
11
We Can't Understand AI Using our Existing Vocabulary
Paper
•
2502.07586
•
Published
•
8
CRANE: Reasoning with constrained LLM generation
Paper
•
2502.09061
•
Published
•
18
Dyve: Thinking Fast and Slow for Dynamic Process Verification
Paper
•
2502.11157
•
Published
•
6
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual
Reasoning in Mathematical LLMs
Paper
•
2502.10454
•
Published
•
6
Diffusion Models without Classifier-free Guidance
Paper
•
2502.12154
•
Published
•
3
Large Language Models and Mathematical Reasoning Failures
Paper
•
2502.11574
•
Published
•
3
Continuous Diffusion Model for Language Modeling
Paper
•
2502.11564
•
Published
•
47
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly
Possess Test-Time Scaling Capabilities?
Paper
•
2502.12215
•
Published
•
11
Training Language Models to Reason Efficiently
Paper
•
2502.04463
•
Published
Efficient Reasoning with Hidden Thinking
Paper
•
2501.19201
•
Published