Can you reduce the kv head num of this model? "num_key_value_heads": 64, which requies a lots of kv cache
#1
by
luchangli03 - opened
This comment has been hidden (marked as Resolved)
This comment has been hidden (marked as Resolved)
We will continue iterating on this going forward. Let's try training with this suggestion.