Hi, there. How does it compare with transformer in speed and accuracy/quality?

The hybrid model is about 10-20% faster depending on the context length used and requires less activation memory for KV cache. In terms of quality, they are about the same but the different models have subtly different strengths and weaknesses.

