20 23 25

Yozh

justheuristic

justheuristic

AI & ML interests

None yet

Recent Activity

liked a model about 1 month ago

openai/gpt-oss-120b

liked a model about 2 months ago

ByteDance-Seed/BAGEL-7B-MoT

liked a model about 2 months ago

moonshotai/Kimi-K2-Instruct

View all activity

Organizations

liked a model about 1 month ago

openai/gpt-oss-120b

Text Generation • 120B • Updated 13 days ago • 3M • • 3.78k

liked 2 models about 2 months ago

ByteDance-Seed/BAGEL-7B-MoT

Any-to-Any • 15B • Updated Jun 23 • 822 • 1.12k

moonshotai/Kimi-K2-Instruct

Text Generation • Updated 4 days ago • 403k • • 2.14k

liked a dataset 2 months ago

yandex/mad-cars

Viewer • Updated Jun 29 • 5.88M • 117 • 30

upvoted an article 3 months ago

Article

Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub

and 6 others •

Jun 12

• 133

upvoted 2 papers 3 months ago

Magistral

Paper • 2506.10910 • Published Jun 12 • 64

Alchemist: Turning Public Text-to-Image Data into Generative Gold

Paper • 2505.19297 • Published May 25 • 84

upvoted a paper 4 months ago

Quartet: Native FP4 Training Can Be Optimal for Large Language Models

Paper • 2505.14669 • Published May 20 • 78

upvoted an article 4 months ago

Article

4D masks support in Transformers

•

Jan 8, 2024

• 30

upvoted 2 papers 5 months ago

Learning Adaptive Parallel Reasoning with Language Models

Paper • 2504.15466 • Published Apr 21 • 43

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134

commented a paper 5 months ago

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Paper • 2504.06261 • Published Apr 8 • 111 •

upvoted 8 papers 5 months ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 99

An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published Apr 8 • 64

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

Paper • 2504.05599 • Published Apr 8 • 86

HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference

Paper • 2504.05897 • Published Apr 8 • 18

Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence

Paper • 2503.20533 • Published Mar 26 • 12

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Paper • 2504.06263 • Published Apr 8 • 181

Pushing the Limits of Large Language Model Quantization via the Linearity Theorem

Paper • 2411.17525 • Published Nov 26, 2024 • 5

Extreme Compression of Large Language Models via Additive Quantization

Paper • 2401.06118 • Published Jan 11, 2024 • 13