Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors Paper • 2506.14794 • Published May 31 • 1
view reply We published the experts that we switched off in the paper (see below). The method to switch them off works at inference time, so no need to upload new weights:
view article Article Finetuning olmOCR to be a faithful OCR-Engine By tngtech and 1 other • Apr 22 • 18
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance By tngtech • Apr 16 • 18