Multi-needle In A Haystack

#25

by ElliottDyson - opened May 27, 2024

May 27, 2024

Many models can be easily trained to perform well on the standard needle in a haystack evaluation. Something much more useful and representative of long-context capabilities is the multi-needle evaluation method. It would be very interesting to see its results in these tests.

leo-pekelis-gradient

DeepSky org Jun 1, 2024

Agreed! Please see
https://gradient.ai/blog/evaluating-models-beyond-niah
https://gradient.ai/blog/ruler-vs-gradient-s-1m-context-length-llama-3-70b

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment