Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Paper β’ 2408.12480 β’ Published Aug 22, 2024 β’ 26
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper β’ 2405.20233 β’ Published May 30, 2024 β’ 7
π§ Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community β’ 24 items β’ Updated May 19 β’ 160
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer By Pringled and 1 other β’ Oct 14, 2024 β’ 94
view article Article Hugging Face on PyTorch / XLA TPUs By jysohn23 and 1 other β’ Feb 9, 2021 β’ 3
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper β’ 2411.07133 β’ Published Nov 11, 2024 β’ 39
Transformer Explainer: Interactive Learning of Text-Generative Models Paper β’ 2408.04619 β’ Published Aug 8, 2024 β’ 161