nanoOmni or nanoAudio next?

by TimeLordRaps - opened May 16

May 16

What are your thoughts on developing long context frontiers particularly with these small models, are there any promising studies you have seen recently on specifically tiny/nano model using the new llama 10m to like distill what could be. Does context length distill well, if not what limitations are you aware of in this model class, how would skywork-r1v2 or similar visual-based reasoning frameworks be used here, would something like focusLLM be useful here but image patches parallelly decoded through a global or local sequence aware parallel decoding set of these nano models. How would this translate to other modalities, like those in the title, but also time and tables.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment