โจ MiniMax-text-01: - 456B with 45.9B activated per token - Combines Lightning Attention, Softmax Attention, and MoE for optimal performance - Training context up to 1M tokens, inference handles 4M tokens
โจ MiniMax-VL-01: - ViT-MLP-LLM framework ( non-transformer๐) - Handles image inputs from 336ร336 to 2016ร2016 - 694M image-caption pairs + 512B tokens processed across 4 stages
InternLM3-8B-instruct๐ฅ Trained on just 4T tokens, it outperforms Llama3.1-8B and Qwen2.5-7B in reasoning tasks, at 75% lower cost! internlm/internlm3-67875827c377690c01a9131d
reacted to fdaudens's
post with ๐ฅโค๏ธ3 days ago
@meg, one of the best researchers in AI ethics, makes a critical point about autonomy: fully autonomous systems carry unknowable risks because they operate on computer logic rather than human logic.
The solution? Build systems that support & assist rather than override human decisions.
I highly recommend reading the blog post written by Meg, @evijit@sasha and @giadap. They define different levels of agent autonomy & provide a values-based analysis of risks, benefits, and uses of AI agents to help you make better decisions.
Started another experimental product training for a client. Doing FLUX Dreambooth / Finetuning via Kohya SS GUI. GPU is L40S and batch size is 7. Config name : Batch_Size_7_48GB_GPU_46250MB_29.1_second_it_Tier_1.json
Community fine-tuned models are more carbon efficient than the models they are derived from! ๐ฅณ๐ฟ
@alozowski@clefourrier@SaylorTwift@albertvillanova evaluated COโ emissions associated with model inference for over 3000 models on the Open LLM Leaderboard. Interesting trends and new insights emerged...๐
Did a fun experiment: What are the main themes emerging from the 100+ Nieman Journalism Lab predictions for 2025?
I used natural language processing to cluster and map them โ really helps spot patterns that weren't obvious when reading predictions one by one. So what will shape journalism next year? A lot of AI and US politics (surprise!), but there's also this horizontal axis that spans from industry strategies to deep reflections on how to talk to the public.
Click any dot to explore the original prediction. What themes surprise/interest you the most?
Coming back to Paris Friday to open our new Hugging Face office!
We're at capacity for the party but add your name in the waiting list as we're trying to privatize the passage du Caire for extra space for robots ๐ค๐ฆพ๐ฆฟ
In the past seven days, the Diffusers team has shipped:
1. Two new video models 2. One new image model 3. Two new quantization backends 4. Three new fine-tuning scripts 5. Multiple fixes and library QoL improvements
Coffee on me if someone can guess 1 - 4 correctly.
1 reply
ยท
reacted to merve's
post with โค๏ธ๐ฅabout 1 month ago