44 54 113

Junlin Zhou

jlzhou

edwardzjl

AI & ML interests

None yet

Recent Activity

reacted to codelion's post with 👀 12 days ago

I recently added a recipe in ellora to improve reasoning capabilities to Gemma-3-1B using self-supervised learning. Model now shows step-by-step thinking in <think> tags before answering. Logic puzzle accuracy: 61% → 84%. 3 hours training on single GPU. 🧠 Used GRPO where model generates multiple responses and learns to prefer better reasoning. Works surprisingly well for making smaller models more transparent. 🔗 Colab: https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_2_Reasoning_LoRA_with_Self-Rewarding_GRPO.ipynb 🤗 Model: https://huggingface.co/codelion/gemma-3-1b-it-reasoning-grpo-lora 💻 Code: https://github.com/codelion/ellora

reacted to Narsil's post with 😎 19 days ago

Me: This function is too slow. Find a faster algorithm. Cursor: Hold my beer. Me: *Slacking off with colleagues* Cursor: Ping. Me: 🤯

reacted to Akhil-Theerthala's post with ❤️ 27 days ago

I'm excited to announce that I've just released the newest versions of my Kuvera models and the expanded Personal Finance Reasoning dataset on Hugging Face! What's new: I've expanded the Personal Finance Reasoning Dataset, which now includes 18.9k samples of real-world financial questions paired with detailed, empathetic answers. The previous generation pipeline was also streamlined with better psychological context and response validations. I've also released new Kuvera models trained on this improved dataset: - Kuvera-4B & 8B: These are my upgraded non-reasoning models, fine-tuned to provide practical financial advice. I've specifically trained the 8B model to better understand the user's emotional context. - Kuvera-12B: A first experimental reasoning model focused on the query resolution. As the sole person working on this project, this release is a noticeable step forward from my previous work, offering more powerful and nuanced tools for financial AI. I am actively looking to collaborate with others who are passionate about analyzing and improving the quality of personal finance advice generated by large language models. If this sounds like you, please reach out! You can check these out on the following links: Models: - https://huggingface.co/Akhil-Theerthala/Kuvera-8B-qwen3-v0.2.1 - https://huggingface.co/Akhil-Theerthala/Kuvera-4B-unsloth-gemma3 - https://huggingface.co/Akhil-Theerthala/kuvera-12B-v0.2.0-unsloth-gemma3 Dataset: - https://huggingface.co/datasets/Akhil-Theerthala/Kuvera-PersonalFinance-V2.1 P.S. The paper on the framework used to generate these models along with the detailed evaluation of the main 8B model's responses is going to be released soon!

View all activity

Organizations

upvoted an article 27 days ago

Article

How to generate text: using different decoding methods for language generation with Transformers

•

Mar 1, 2020

• 243

upvoted a paper about 2 months ago

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published Jul 14 • 88

upvoted an article about 2 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

and 22 others •

Jul 8

• 648

upvoted an article 2 months ago

Article

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

and 1 other •

Jul 9

• 668

upvoted 4 papers 3 months ago

Don't Pay Attention

Paper • 2506.11305 • Published Jun 12 • 8

Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

Paper • 2506.06205 • Published Jun 6 • 30

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 260

Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods

Paper • 2505.17870 • Published May 23 • 5

upvoted a paper 4 months ago

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Paper • 2312.03209 • Published Dec 6, 2023 • 21

upvoted an article 4 months ago

Article

Uncensor any LLM with abliteration

•

Jun 13, 2024

• 672

upvoted a paper 5 months ago

RealHarm: A Collection of Real-World Language Model Application Failures

Paper • 2504.10277 • Published Apr 14 • 11

upvoted an article 6 months ago

Article

You could have designed state of the art positional encoding

•

Nov 25, 2024

• 356

upvoted a paper 6 months ago

Min P Sampling: Balancing Creativity and Coherence at High Temperature

Paper • 2407.01082 • Published Jul 1, 2024 • 1

upvoted an article 6 months ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

and 3 others •

Mar 12

• 460

upvoted a paper 6 months ago

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

Paper • 2502.18080 • Published Feb 25 • 2

upvoted 2 articles 6 months ago

Article

Open R1: Update #3

and 9 others •

Mar 11

• 295

Article

From Files to Chunks: Improving Hugging Face Storage Efficiency

and 1 other •

Nov 20, 2024

• 63

upvoted 2 papers 7 months ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 126

The Differences Between Direct Alignment Algorithms are a Blur

Paper • 2502.01237 • Published Feb 3 • 115

upvoted an article 7 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 211

Junlin Zhou

AI & ML interests

Recent Activity

Organizations

jlzhou's activity

How to generate text: using different decoding methods for language generation with Transformers

SmolLM3: smol, multilingual, long-context reasoner

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

Uncensor any LLM with abliteration

You could have designed state of the art positional encoding

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Open R1: Update #3

From Files to Chunks: Improving Hugging Face Storage Efficiency

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge