5 5 7

Robert Dahlke PRO

rbrt

https://www.tngtech.com

robert-dahlke

AI & ML interests

MoE Architecture, building Chimera Models, Finetuning

Recent Activity

updated a Space about 18 hours ago

tngtech/README

liked a model about 19 hours ago

unsloth/DeepSeek-TNG-R1T2-Chimera-BF16

liked a model about 19 hours ago

unsloth/DeepSeek-TNG-R1T2-Chimera

View all activity

Organizations

updated a Space about 18 hours ago

README

🚀

TNG on huggingface

liked 2 models about 19 hours ago

unsloth/DeepSeek-TNG-R1T2-Chimera-BF16

Text Generation • 684B • Updated about 17 hours ago • 2

unsloth/DeepSeek-TNG-R1T2-Chimera

Text Generation • 685B • Updated about 12 hours ago • 2

updated a model about 19 hours ago

tngtech/DeepSeek-TNG-R1T2-Chimera

Text Generation • 685B • Updated about 19 hours ago • 17 • 73

New activity in tngtech/DeepSeek-TNG-R1T2-Chimera about 19 hours ago

Missing `model.safetensors.index.json`

#2 opened about 20 hours ago by

danielhanchen

liked a model 1 day ago

tngtech/DeepSeek-TNG-R1T2-Chimera

Text Generation • 685B • Updated about 19 hours ago • 17 • 73

published a model 1 day ago

tngtech/DeepSeek-TNG-R1T2-Chimera

Text Generation • 685B • Updated about 19 hours ago • 17 • 73

New activity in tngtech/DeepSeek-R1T-Chimera 9 days ago

Any plans to release an updated version based on DeepSeek-V3-0526 + R1, or how to create the merge myself?

#4 opened about 1 month ago by

Lissanro

authored a paper 15 days ago

Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors

Paper • 2506.14794 • Published May 31 • 1

New activity in tngtech/DeepSeek-R1T-Chimera about 2 months ago

Paid version?

#2 opened about 2 months ago by

Blazgo

updated a model about 2 months ago

tngtech/DeepSeek-R1T-Chimera

Text Generation • 685B • Updated 15 days ago • 2.54k • 247

New activity in tngtech/DeepSeek-R1T-Chimera 2 months ago

Questions on how routed experts are merged

👍 👀 17

#1 opened 2 months ago by

chuhac

commented on Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time 2 months ago

We published the experts that we switched off in the paper (see below). The method to switch them off works at inference time, so no need to upload new weights:

liked 2 models 2 months ago

deepseek-ai/DeepSeek-Prover-V2-671B

Text Generation • 685B • Updated Apr 30 • 3.73k • • 803

tngtech/DeepSeek-R1T-Chimera

Text Generation • 685B • Updated 15 days ago • 2.54k • 247

published a model 2 months ago

tngtech/DeepSeek-R1T-Chimera

Text Generation • 685B • Updated 15 days ago • 2.54k • 247

upvoted an article 2 months ago

Article

Finetuning olmOCR to be a faithful OCR-Engine

and 1 other •

Apr 22

• 18

upvoted 2 articles 3 months ago

Article

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

•

Apr 16

• 18

Article

Efficient Request Queueing – Optimizing LLM Performance

•

Apr 2

• 12

liked a model 3 months ago

deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27 • 387k • • 2.99k

Robert Dahlke PRO

AI & ML interests

Recent Activity

Organizations

rbrt's activity

README

Missing `model.safetensors.index.json`

Any plans to release an updated version based on DeepSeek-V3-0526 + R1, or how to create the merge myself?

Paid version?

Questions on how routed experts are merged

Finetuning olmOCR to be a faithful OCR-Engine

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

Efficient Request Queueing – Optimizing LLM Performance