Dokyoon

leeloolee

119 333

Eruly

AI & ML interests

Recent Activity

liked a model 1 day ago

InternScience/Agents-A1

liked a model 10 days ago

dalpha-ai/Cobra-Agent

liked a model about 1 month ago

Hcompany/Holo-3.1-9B

View all activity

Organizations

liked a model 1 day ago

InternScience/Agents-A1

Text Generation • 35B • Updated 1 day ago • 5.46k • 227

liked a model 10 days ago

dalpha-ai/Cobra-Agent

Updated Jun 2 • 3

liked a model about 1 month ago

Hcompany/Holo-3.1-9B

Image-Text-to-Text • 9B • Updated Jun 2 • 7.52k • 26

upvoted a paper about 1 month ago

AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification

Paper • 2606.03031 • Published Jun 2 • 6

liked a model about 1 month ago

Kwai-Keye/Keye-VL-2.0-30B-A3B

Image-Text-to-Text • 31B • Updated 24 days ago • 14.4k • 119

liked a dataset about 1 month ago

InternScience/SGI-DeepResearch

Viewer • Updated Jun 2 • 318 • 224 • 9

liked a Space about 1 month ago

Open AI Co-Scientist

📊

Open-source implementation of Google's AI Co-Scientist

liked 2 datasets about 1 month ago

FrontierCS/Frontier-CS

Viewer • Updated 5 days ago • 268 • 5.71k • 6

google/FACTS-grounding-public

Viewer • Updated Dec 19, 2024 • 868 • 1.36k • 46

reacted to imnotkitty's post with 🔥 2 months ago

Post

4030

tencent/Hy3-preview is out: an open-weights MoE reasoning model.

✅ 295B total / 21B active / 256K context
✅ Fused fast-and-slow thinking in a single model
✅ First model trained on Hunyuan's rebuilt pretraining + RL infra (Feb → Apr)

Benchmarks:
👉 SWE-Bench Verified, Terminal-Bench 2.0, BrowseComp, WideSearch — competitive results, particularly strong on agentic tool use
👉 Top score on Tsinghua's 2026 Spring math PhD qualifying exam
👉 Strong context-learning and instruction-following on Tencent's CL-bench / CL-bench-Life

More details can be found in my article: https://huggingface.co/blog/imnotkitty/hy3-preview

2 replies

upvoted a paper 2 months ago

Reinforcement-aware Knowledge Distillation for LLM Reasoning

Paper • 2602.22495 • Published Feb 26 • 6

liked a model 2 months ago

microsoft/maira-2-sae

Feature Extraction • Updated Jul 23, 2025 • 9

reacted to anakin87's post with ❤️ 3 months ago

Post

3314

📣 I just published a free course on Reinforcement Learning Environments for Language Models!

📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course

Over the past year, we've seen a shift in LLM Post-Training.
Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.

Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.

But what actually are these environments in practice❓ And how do you build them effectively❓

Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models.
I've packaged everything I learned into this short course.

What you'll learn

🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain
🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts
🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments

🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master
🔸 Build the game Environment
🔸 Use it to generate synthetic data for SFT warm-up
🔸 Group-based Reinforcement Learning

If you're interested in building "little worlds" where LLMs can learn, this course is for you.

---

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

1 reply

liked a dataset 3 months ago

InternScience/ResearchClawBench

Benchmark • Updated 3 days ago • 57 • 7.82k • 8

liked a model 3 months ago

rl-research/DR-Tulu-8B-results

Updated Mar 26 • 1

upvoted a paper 3 months ago

Grounding Everything in Tokens for Multimodal Large Language Models

Paper • 2512.10554 • Published Dec 11, 2025 • 2

reacted to Shrijanagain's post with 🔥 3 months ago

Post

5657

We are thrilled to announce the launch of SKT-OMNI-CORPUS-2T, a massive-scale, high-quality dataset designed to power the next generation of Foundation Models (LLMs) from scratch.
Developed at SKT AI LABS, this corpus is not just a collection of data; it’s a mission to decentralize high-grade AI training for regional languages and global knowledge.

💎 Key Highlights:

•• Massive Scale: Targeting a multi-terabyte architecture for 2T-level tokenization.

•• Pure Quality: Curated from 500+ Elite Sources

•• Structured for MoE: Perfectly sharded into 3.5GB standardized units (SKT-𝕻 series) for seamless distributed training.

🤝 Open for Collaboration!

We are looking for AI researchers, CUDA engineers, and data scientists to join us in this journey of building Project Surya and the ST-X Series models. Whether it's optimization, custom tokenization, or architecture design—let’s build the future together.

Explore the Dataset on Hugging Face:

🔗 https://huggingface.co/datasets/Shrijanagain/SKT-OMNI-CORPUS-146T-V1

DSR -- 🔗 https://huggingface.co/datasets/Shrijanagain/SKT-DSRx10000

#AI #MachineLearning #OpenSource #IndicAI #SKTAILABS #LLM #BigData #HuggingFace #InnovationIndia

upvoted a paper 3 months ago

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Paper • 2603.22117 • Published Mar 23 • 29

upvoted an article 3 months ago

Article

Scaling OpenEnv: From Free Usage to Thousands of Concurrent Environments

burtenshaw

•

Jan 20

• 12

reacted to DedeProGames's post with 🚀 3 months ago

Post

5279

Introducing GRM2, a powerful 3 billion parameter model designed for long-term reasoning and high performance in complex tasks.

Even with only 3 billion parameters, it outperforms qwen3-32b in several benchmarks and complex reasoning tasks.

With just 3 billion parameters, it can also generate extensive and complex code with over 1000 lines, utilize tools comparable to larger models, and is perfect for agentic tasks.

GRM2 is licensed under Apache 2.0, making it ideal as a base for FineTune in other tasks.
You can see more here: OrionLLM/GRM2-3b

Dokyoon

AI & ML interests

Recent Activity

Organizations

leeloolee's activity

Open AI Co-Scientist

Scaling OpenEnv: From Free Usage to Thousands of Concurrent Environments