Qwen/Qwen3-0.6B · Training methodologys

May 1

Will you release a training script for Jupyter for this model : - ( for me preferably unsloth ) -

I am Also interested in how to train a model with this GATE ! .. between models : is there also a way to construct this model from Two models ? Ie a deep thiner model and a basic llm model ( ie , two pretrained models ? ) as it is a mixture of experts ? or is it ?

Its very interesting this model is so small : so that it is very useful : i would lie to train on my own datastac and see iif it responds llie my own 7b mistral model : as how much training would be needed to transfer from one model to another model all its knowedge ? is it even possible to do a download of a models knowlegde base ? ?( not yet ) ....SO Mergeing strategies may be the only option for knowledge transfer ... ( problem :::: what about the differences in model architectures ) we need to be able to construct a model from a past model such as a mistral pretrained or a llama pretrained weaving it into the final model such as with the llava model types : ie the configuration of the inititation of the pretrained model :
the main problem as i have previoulsy created these new architectures is that the llamCPP does not support them :
And constructing them on the fly sometimes can be expensive hardwae usage :

When constructing the omni etc , it uses each component from each llm type modality etc ... such as oif this smaller model can indeed be trained from distilled information to have the same knowledge as the higher models indeed then we have a truly great component .... a slim llm which can be used in the construction of models from the qwen family or other archtectures such as llava etc ! ...

So indeed training pathways are the issue as when constructing a model from ocmponents the training is neccsary to alling the model ...

So the omni Should also release a training script which allows for all components of the modle to be trained at the same time "!
etc !
so the text input feeds the speech input and the video and the oiomage inputs also feed the llm inputs and produce outputs for all modalitys each turn ! as indeed a video is sound and images ... so a video input feeds all inputs! ....

LeroyDyer

May 1

ALSO VERY GOOD WORK TEAM! ...

Please dont forget the huggingface intergrations as a priority as well as perhaps a simulated transformer library ( for only qwen models ) ... So a hugging face library without other models ! just the qwen .... this would enable for early prereleases of models .... ( also it could contain the chatUI ! ) ... (not docker) ...

Yanes19

May 11

for this size, the model is incredibly good, it tends to insist on wrong conlusions sometimes, but Please have anyone written a TRAINING SCRIPT PLEASE , i tried multiple approaches and failed