AI & ML interests

Text classification

Recent Activity

codelionĀ 
posted an update 3 days ago
view post
Post
3016
Implemented Test-Time Diffusion Deep Researcher (TTD-DR) in OptiLLM! šŸš€

Just shipped a game-changing feature that turns any LLM into a powerful research agent. TTD-DR applies diffusion-inspired techniques to iteratively refine research reports while grounding them in real web sources.

How it works:
• Generates initial draft
• Identifies knowledge gaps
• Searches web for missing info
• Iteratively refines through "denoising" steps
• Produces comprehensive reports with 15-30+ sources

The magic? It works with ANY model so you can choose your favorite open-source models on HF!

Key results:
- 47 complex research queries tested
- Every report backed by real web sources
- Quality rivals human research analysts
- No more hallucinations on current events!

Try it:
pip install optillm
Then use "deep_research-your-model-name" as the model identifier

- Implementation: https://github.com/codelion/optillm/tree/main/optillm/plugins/deep_research
- Paper: https://arxiv.org/abs/2507.16075v1
- Sample reports: https://github.com/codelion/optillm/tree/main/optillm/plugins/deep_research/sample_reports

Special thanks to the TTD-DR paper authors for this brilliant approach!

#research #llm #opensource #inference
codelionĀ 
posted an update 5 days ago
view post
Post
1519
New research: Understanding how different LLMs approach reasoning through "thought anchors"

I just published a comparative study analyzing the reasoning patterns of Qwen3-0.6B vs DeepSeek-R1-Distill-1.5B using thought anchors - critical sentences that significantly impact task success probability.

Key findings:
- DeepSeek-R1: Uses concentrated reasoning with fewer, high-impact steps (0.408 avg impact)
- Qwen3: Employs distributed reasoning spreading impact across multiple steps (0.278 avg impact)
- Different risk-reward profiles: DeepSeek more consistent (82.7% positive steps), Qwen3 more exploratory (71.6% positive)

This reveals different cognitive architectures rather than simple performance differences. The models optimize for different reasoning strategies - consistency vs exploration.

Both datasets are now available on HF:
- Qwen3 thought anchors: codelion/Qwen3-0.6B-pts-thought-anchors
- DeepSeek-R1 thought anchors: codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-thought-anchors

Built using our open-source PTS library for mechanistic interpretability analysis. All methodology is fully reproducible.

Full article: https://huggingface.co/blog/codelion/understanding-model-reasoning-thought-anchors

What reasoning patterns have you noticed in your model experiments? Would love to hear about other architectures showing similar cognitive diversity!
codelionĀ 
updated 17 models 6 days ago