Suggested tokenizer changes similar to Phi-4
#8 opened about 18 hours ago
by
l2dy
Different number of attention heads, makes rotary_ndims vs rope scaling factors wrong?
14
#1 opened 2 days ago
by
bartowski
