custom GEGLU implementation
#32 opened about 1 month ago
by
brwang
Independent evaluation results
#30 opened 5 months ago
by
yaronr
Getting the error: "triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 180224, Hardware limit: 166912. Reducing block sizes or `num_stages` may help."
2
#27 opened 7 months ago
by
Pranav0511

Why the inference speed so slow compare with same 7B parameters of Qwen?
#26 opened 8 months ago
by
lucasjin
Upload triton_flash_blocksparse_attn.py
#25 opened 8 months ago
by
barcelosallan
Phi-3-small doesn't load with TGI
1
#24 opened 8 months ago
by
aveer30
Multi-GPU training fails when using device_map = "auto"
2
#23 opened 8 months ago
by
aveer30
Shared memory error
9
#15 opened 9 months ago
by
marktenenholtz
