44 54 113

Junlin Zhou

jlzhou

edwardzjl

AI & ML interests

None yet

Recent Activity

reacted to codelion's post with 👀 12 days ago

I recently added a recipe in ellora to improve reasoning capabilities to Gemma-3-1B using self-supervised learning. Model now shows step-by-step thinking in <think> tags before answering. Logic puzzle accuracy: 61% → 84%. 3 hours training on single GPU. 🧠 Used GRPO where model generates multiple responses and learns to prefer better reasoning. Works surprisingly well for making smaller models more transparent. 🔗 Colab: https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_2_Reasoning_LoRA_with_Self-Rewarding_GRPO.ipynb 🤗 Model: https://huggingface.co/codelion/gemma-3-1b-it-reasoning-grpo-lora 💻 Code: https://github.com/codelion/ellora

reacted to Narsil's post with 😎 19 days ago

Me: This function is too slow. Find a faster algorithm. Cursor: Hold my beer. Me: *Slacking off with colleagues* Cursor: Ping. Me: 🤯

reacted to Akhil-Theerthala's post with ❤️ 26 days ago

I'm excited to announce that I've just released the newest versions of my Kuvera models and the expanded Personal Finance Reasoning dataset on Hugging Face! What's new: I've expanded the Personal Finance Reasoning Dataset, which now includes 18.9k samples of real-world financial questions paired with detailed, empathetic answers. The previous generation pipeline was also streamlined with better psychological context and response validations. I've also released new Kuvera models trained on this improved dataset: - Kuvera-4B & 8B: These are my upgraded non-reasoning models, fine-tuned to provide practical financial advice. I've specifically trained the 8B model to better understand the user's emotional context. - Kuvera-12B: A first experimental reasoning model focused on the query resolution. As the sole person working on this project, this release is a noticeable step forward from my previous work, offering more powerful and nuanced tools for financial AI. I am actively looking to collaborate with others who are passionate about analyzing and improving the quality of personal finance advice generated by large language models. If this sounds like you, please reach out! You can check these out on the following links: Models: - https://huggingface.co/Akhil-Theerthala/Kuvera-8B-qwen3-v0.2.1 - https://huggingface.co/Akhil-Theerthala/Kuvera-4B-unsloth-gemma3 - https://huggingface.co/Akhil-Theerthala/kuvera-12B-v0.2.0-unsloth-gemma3 Dataset: - https://huggingface.co/datasets/Akhil-Theerthala/Kuvera-PersonalFinance-V2.1 P.S. The paper on the framework used to generate these models along with the detailed evaluation of the main 8B model's responses is going to be released soon!

View all activity

Organizations

reacted to codelion's post with 👀 12 days ago

Post

4920

I recently added a recipe in ellora to improve reasoning capabilities to Gemma-3-1B using self-supervised learning. Model now shows step-by-step thinking in <think> tags before answering.

Logic puzzle accuracy: 61% → 84%. 3 hours training on single GPU. 🧠

Used GRPO where model generates multiple responses and learns to prefer better reasoning. Works surprisingly well for making smaller models more transparent.

🔗 Colab: https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_2_Reasoning_LoRA_with_Self-Rewarding_GRPO.ipynb

🤗 Model: codelion/gemma-3-1b-it-reasoning-grpo-lora

💻 Code: https://github.com/codelion/ellora

1 reply

reacted to Narsil's post with 😎 19 days ago

Post

2107

Me: This function is too slow. Find a faster algorithm.
Cursor: Hold my beer.

Me: *Slacking off with colleagues*
Cursor: Ping.

Me: 🤯

reacted to Akhil-Theerthala's post with ❤️ 26 days ago

Post

2098

I'm excited to announce that I've just released the newest versions of my Kuvera models and the expanded Personal Finance Reasoning dataset on Hugging Face!

What's new:
I've expanded the Personal Finance Reasoning Dataset, which now includes 18.9k samples of real-world financial questions paired with detailed, empathetic answers. The previous generation pipeline was also streamlined with better psychological context and response validations.

I've also released new Kuvera models trained on this improved dataset:
- Kuvera-4B & 8B: These are my upgraded non-reasoning models, fine-tuned to provide practical financial advice. I've specifically trained the 8B model to better understand the user's emotional context.
- Kuvera-12B: A first experimental reasoning model focused on the query resolution.

As the sole person working on this project, this release is a noticeable step forward from my previous work, offering more powerful and nuanced tools for financial AI.

I am actively looking to collaborate with others who are passionate about analyzing and improving the quality of personal finance advice generated by large language models. If this sounds like you, please reach out!

You can check these out on the following links:

Models:
- Akhil-Theerthala/Kuvera-8B-qwen3-v0.2.1
- Akhil-Theerthala/Kuvera-4B-unsloth-gemma3
- Akhil-Theerthala/kuvera-12B-v0.2.0-unsloth-gemma3

Dataset:
- Akhil-Theerthala/Kuvera-PersonalFinance-V2.1

P.S. The paper on the framework used to generate these models along with the detailed evaluation of the main 8B model's responses is going to be released soon!

2 replies

upvoted an article 27 days ago

Article

How to generate text: using different decoding methods for language generation with Transformers

•

Mar 1, 2020

• 243

liked a dataset about 1 month ago

UCLNLP/adversarial_qa

Viewer • Updated Dec 21, 2023 • 72k • 4.07k • 41

upvoted a paper about 2 months ago

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published Jul 14 • 88

reacted to Kseniase's post with ❤️ about 2 months ago

Post

6220

6 Essential Reads on core AI/ML topics:

Time to look at some free useful resources that can help you upgrade your knowledge of AI and machine learning!
Today we offer you these 6 must-read surveys that can be your perfect guides to the major fields and techniques:

1. Foundations of Large Language Models by Tong Xiao and Jingbo Zhu → https://arxiv.org/abs/2501.09223
Many recommend this 270-page book as a good resource to focus on fundamental concepts, such as pre-training, generative models, prompting, alignment, and inference

2. Large Language Models Post-Training: Surveying Techniques from Alignment to Reasoning -> A Survey on Post-training of Large Language Models (2503.06072)
Read this to master policy optimization (RLHF, DPO, GRPO), supervised and parameter-efficient fine-tuning, reasoning, integration, and adaptation techniques

3. Agentic Large Language Models, a survey by Leiden University → https://arxiv.org/abs/2503.23037
Surveys agentic LLMs across reasoning, tools, and multi-agent collaboration, highlighting their synergy. It also explores their promise, risks and applications in medicine, finance, science.

4. A Survey of Context Engineering for Large Language Models → A Survey of Context Engineering for Large Language Models (2507.13334)
Defines Context Engineering as systematic info design for LLMs beyond prompting, covering retrieval, processing, management, and architectures like RAG and multi-agent systems

5. A Survey of Generative Categories and Techniques in Multimodal Large Language Models → https://arxiv.org/abs/2506.10016
Covers multimodal models, exploring six generative modalities, key techniques (SSL, RLHF, CoT), architectural trends, and challenges

6. Large Language models for Time Series Analysis: Techniques, Applications, and Challenges → https://arxiv.org/abs/2506.11040
Explains how LLMs transform time series analysis by enhancing pattern recognition and long-term dependency handling + shows how to build them

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe