nanoOmni or nanoAudio next?

#5
by TimeLordRaps - opened

What are your thoughts on developing long context frontiers particularly with these small models, are there any promising studies you have seen recently on specifically tiny/nano model using the new llama 10m to like distill what could be. Does context length distill well, if not what limitations are you aware of in this model class, how would skywork-r1v2 or similar visual-based reasoning frameworks be used here, would something like focusLLM be useful here but image patches parallelly decoded through a global or local sequence aware parallel decoding set of these nano models. How would this translate to other modalities, like those in the title, but also time and tables.

Sign up or log in to comment