2 117 61

Raja Biswas

rbiswasfc

AI & ML interests

NLP, Generative AI

Recent Activity

liked a model about 12 hours ago

Qwen/Qwen2.5-32B-Instruct

upvoted a paper 4 days ago

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

upvoted a paper 4 days ago

Towards Best Practices for Open Datasets for LLM Training

View all activity

Articles

Finally, a Replacement for BERT: Introducing ModernBERT

Dec 19, 2024

• 513

Organizations

rbiswasfc's activity

upvoted 3 papers 4 days ago

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Paper • 2501.12368 • Published 9 days ago • 39

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published 16 days ago • 51

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 8 days ago • 263

upvoted an article 6 days ago

Article

Mastering Long Contexts in LLMs with KVPress

•

7 days ago

• 56

upvoted a paper 7 days ago

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published 16 days ago • 270

upvoted a paper 8 days ago

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published 13 days ago • 100

upvoted an article 9 days ago

Article

Yay! Organizations can now publish blog Articles

•

10 days ago

• 30

upvoted 2 articles 15 days ago

Article

Visualize and understand GPU memory in PyTorch

Dec 24, 2024

• 173

Article

Diving into MiniMax01 405B MoE

•

15 days ago

• 17

upvoted a paper 16 days ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published 17 days ago • 89

upvoted a paper 20 days ago

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published 22 days ago • 249

upvoted 2 papers 24 days ago

Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 45

YuLan-Mini: An Open Data-efficient Language Model

Paper • 2412.17743 • Published Dec 23, 2024 • 64

upvoted 3 papers about 1 month ago

RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response

Paper • 2412.14922 • Published Dec 19, 2024 • 85

No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 41

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 344

upvoted a collection about 1 month ago

ModernBERT

Collection

Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 128

upvoted a paper about 1 month ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 125

upvoted 2 papers about 2 months ago

Evaluating and Aligning CodeLLMs on Human Preference

Paper • 2412.05210 • Published Dec 6, 2024 • 47

Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 75