11 38 126

KuKu

dragonkue

AI & ML interests

anything.

Recent Activity

upvoted a paper about 3 hours ago

Phi-4 Technical Report

liked a model about 20 hours ago

skt/A.X-Encoder-base

upvoted a paper 4 days ago

Group Sequence Policy Optimization

View all activity

Organizations

upvoted a paper about 3 hours ago

Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 121

upvoted a paper 4 days ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published 12 days ago • 263

upvoted an article 10 days ago

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

and 5 others •

Jun 3

• 81

upvoted an article 27 days ago

Article

SmolLM3: smol, multilingual, long-context reasoner

and 22 others •

28 days ago

• 606

upvoted a paper 28 days ago

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1 • 72

upvoted a paper about 2 months ago

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 133

upvoted a collection about 2 months ago

Kanana-1.5

Collection

Open Source Kanana-1.5 • 7 items • Updated 12 days ago • 27

upvoted a paper about 2 months ago

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 270

upvoted 3 papers 2 months ago

Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Paper • 2505.10554 • Published May 15 • 120

Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Paper • 2504.21233 • Published Apr 30 • 48

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Paper • 2505.16400 • Published May 22 • 33

upvoted a paper 3 months ago

ReasonIR: Training Retrievers for Reasoning Tasks

Paper • 2504.20595 • Published Apr 29 • 55

upvoted 2 articles 4 months ago

Article

Can RLHF with Preference Optimization Techniques Help LLMs Surpass GPT4-Quality Models?

•

Nov 24, 2024

• 4

Article

Preference Tuning LLMs with Direct Preference Optimization Methods

and 4 others •

Jan 18, 2024

• 69

upvoted a paper 4 months ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 57

upvoted an article 4 months ago

Article

Training and Finetuning Reranker Models with Sentence Transformers v4

•

Mar 26

• 151

upvoted an article 5 months ago

Article

What Makes a Dialog Agent Useful?

and 3 others •

Jan 24, 2023

• 2

upvoted a collection 5 months ago

EXAONE-Deep

Collection

EXAONE reasoning model series of 2.4B, 7.8B, and 32B, optimized for reasoning tasks including math and coding • 10 items • Updated 29 days ago • 92

upvoted 2 papers 5 months ago

Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

Paper • 2503.00865 • Published Mar 2 • 65

Gemini Embedding: Generalizable Embeddings from Gemini

Paper • 2503.07891 • Published Mar 10 • 42

KuKu

AI & ML interests

Recent Activity

Organizations

dragonkue's activity

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

SmolLM3: smol, multilingual, long-context reasoner

Can RLHF with Preference Optimization Techniques Help LLMs Surpass GPT4-Quality Models?

Preference Tuning LLMs with Direct Preference Optimization Methods

Training and Finetuning Reranker Models with Sentence Transformers v4

What Makes a Dialog Agent Useful?