Bad at Following Following instructions
Amazing work , loving the model .
My only problem thats its not the best at Following complicated system prompts.
I was using qkm quant so that might be it , overall great job.
Appreciate it - we'll look at some more sophisticated quants in the future, and will likely update the model over time as we refine the technique.
I noticed when quantizing the model to GGUF, The resultant quantized model was smaller than a equally quantized version Llama– 3.1–8B–Instruct - even when keeping the Output Tensors in Q8_0 and the Embeddings up-casted to F32, as I usually do. I don’t know why this is, but, in further quantization scheme experimenting, I learned that by up-casted the Output Tensors to F32 for the model and, as initially, up-casting the Embeddings to F32 that the model’s size comes to what it should be given the quantization settings. And, that it performs better.
GGUF OF32.EF32.IQuants are available in:
IQ4_K_M, IQ6_K, and, IQ8_0
You can find them below:
https://huggingface.co/Joseph717171/Llama-3.1-SuperNova-Lite-8.0B-OQ8_0.EF32.IQ4_K-Q8_0-GGUF