MiniMaxAI/MiniMax-VL-01
Text Generation
β’
Updated
β’
588
β’
184
Note A non transformer based ( ViT-MLP-LLM framework) VLM
Note 456B LLM with 1M tokens training context
Note End-side multimodal LLM that supports real time conversation and video understanding.
Note A unified model for dense grounded understanding of images & videos.
Note A multimodel dataset for vision language pretraining , includes 6.5M images + 0.8B text from 22k hours of instructional videos
Note Dataset designed specifically for natural language processing (NLP) tasks in the education sector.