ct2's picture

ct2

ct-2

·

AI & ML interests

None yet

Recent Activity

new activity 23 days ago

moonshotai/Kimi-K2-Instruct:is kimi k2 trained with fp8?

liked a model about 1 month ago

ai21labs/AI21-Jamba-Mini-1.7

upvoted a collection about 1 month ago

View all activity

Organizations

None yet

New activity in moonshotai/Kimi-K2-Instruct 23 days ago

is kimi k2 trained with fp8?

#30 opened 23 days ago by

liked a model about 1 month ago

ai21labs/AI21-Jamba-Mini-1.7

52B • Updated Jul 6 • 1.91k • 31

upvoted 2 collections about 1 month ago

Jamba 1.7

The AI21 Jamba family of models are hybrid SSM-Transformer foundation models, blending speed, efficient long context processing, and accuracy. • 4 items • Updated Jul 2 • 11

BitVLA

1-bit Vision-Language-Action Models for Robotics Manipulation • 9 items • Updated Jun 30 • 3

liked 3 models 3 months ago

tiiuae/Falcon-E-3B-Instruct

Text Generation • 0.9B • Updated Jul 10 • 1.1k • 29

tiiuae/Falcon-E-3B-Base

Text Generation • 0.9B • Updated Jul 10 • 886 • 10

codys12/bitnet-r1-qwen-32b

Text Generation • Updated May 12 • 822 • 9

upvoted a paper 3 months ago

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

Paper • 2504.18415 • Published Apr 25 • 47

upvoted a paper 4 months ago

BitNet b1.58 2B4T Technical Report

Paper • 2504.12285 • Published Apr 16 • 74

upvoted a collection 4 months ago

BitNet

🔥BitNet family of large language models (1-bit LLMs). • 7 items • Updated May 1 • 49

liked a model 4 months ago

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 4.63k • 1.15k

upvoted a paper 4 months ago

TransMamba: Flexibly Switching between Transformer and Mamba

Paper • 2503.24067 • Published Mar 31 • 21

liked 2 models 5 months ago

Virg1n/LightLM

Updated Mar 24 • 2

VPTQ-community/deepseek-r1_v_8_k_65536_mixed_mp4

Updated Mar 12 • 41 • 2

upvoted 2 papers 5 months ago

MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use

Paper • 2502.15872 • Published Feb 21 • 5

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Paper • 2502.17055 • Published Feb 24 • 19

upvoted a collection 5 months ago

Slam

All resources for SpeechLMs from "Slamming: Training a Speech Language Model on One GPU in a Day". We provide tokeniser, lm, and datasets • 7 items • Updated May 22 • 13

upvoted a paper 5 months ago

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Paper • 2502.19261 • Published Feb 26 • 7

liked a model 5 months ago

llm-jp/llm-jp-3-172b-instruct3

Text Generation • 172B • Updated Jan 20 • 156 • 10

upvoted a collection 5 months ago

Drop-Upcycling

33 items • Updated May 30 • 2