Prayog Pratham-100B is a multimodel, compute efficient and GPU independent model of its first kind where it can be run in normal laptop or pc having 16GB RAM and i5 Processor or similar configuration. This model is upto 1 trillion parameter scalable and beyond but for now we call it upto 100 Billion Parameter model.
This Model is developed to empower the AI community to do the rapid reasearch and development without worrying about the cost of GPUs for AI workload.
This Model empowers Data Scientist, AI Scientists, Startups and Big Corporate as Win Win Scenario for everyone because Compute is not a constant now its a power whereas startups with limited compute power can innovate more with this model and Big corporates can have luxury to utilise this solution if required.
Previous efforts also make for such systems like mixers-of-experts model but atleast once the all models needs to be loaded in the memory so one has to atleast need higher amount of RAM to work with such systems whereas KratimBudhimata Model provides luxury that you could have thousands of models but the irrelevent model which does not contribute to the particular user query response need not to be loaded into the system this way it solves hallucination problem and provides availability of all models with smart uses as and when required.
This Model also capable to do search using duckduckgo along with different modalities.
The Intent of this model to empower the AI community to think beyond the computational limitation to empower the Humanity and less computation also contribute towards better Planet and less Global Warming.
We share the complete model building code with example so that anyone can reproduce it if required and this model comes with MIT Licence.
Here are some details-
LLMs does great role in AI but needed GPUs and heavier computation which is not only costly based on the money but also it creates blocker for democratisation of AI which actually contribute towards the rapid growth of open source AI. Therefore There is a need of system where One can create a billion parameter model or 100 Billion Parameter model or 1 Trillion Parameter Model without the need of GPU or massive RAM or computational resources like with 16GB RAM and i5 processor, one should able to run that model. Thus Kratim Budhimata model provides that key requirement where you can create n number of models inside a single model which provides feasible management of large number of models and you can train and predict selective models which actually contribute for the prediction for particular user query, this way there is no way to even load all the model into RAM atleast one time. One don't need to load full model anytime that's the magic of this solution. you can run this multimodel system and just use a classifier to predict which model should be used to answer that query after that you can load only that model. So it can be horizontally scaled rather than vertically meaning instead of having one big model you can have thousands of small models which actually provides better results and solve the hallucination problem also.
Solution : Kindly see the Model file or Example html file for complete solution details. In nutshell, a few details mentioned below-
1. Create a Kratim Budhimata model class and initalise it and call it first model
2. Train the classification models on the prompts (For Large data, One should either use sampling of prompts or if require, one can create multiple classification models like one model will tell whether the query is related to science or maths another sub classification model will tell inside science whether its related to Engineering or Medical Science and so on for building compute efficient systems) and create the labels based on model number or dataset number. let's say 1 for text summarisation model in below example
3. Save the weights of classification model and summarisation model
4. Initalise another instance of Kratim Budhimata model class and call it second model
5. Load the trained weight of first model's classification model into Second model's classification
6. Predict the class using classification model
7. Based on that class, Initalize and load the relevent model to predict the response in our case its summarisation model.
8. One can assign none value after saving the weight to the first model for saving resources in production environment or whereever required.
if you have anyquery or suggestions please reach out to email address - [email protected]