30 4 34

Fidite Nemini PRO

FiditeNemini

FiditeNemini2023

AI & ML interests

Prompt engineering, unalignment, MLX, model merging, diffusion models

Recent Activity

liked a model 8 days ago

HiDream-ai/HiDream-I1-Full

reacted to merterbak's post with 🔥 10 days ago

Meta has unveiled its Llama 4 🦙 family of models, featuring native multimodality and mixture-of-experts architecture. Two model families are available now: Models🤗: https://huggingface.co/collections/meta-llama/llama-4-67f0c30d9fe03840bc9d0164 Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/ HF's Blog Post: https://huggingface.co/blog/llama4-release - 🧠 Native Multimodality - Process text and images in a unified architecture - 🔍 Mixture-of-Experts - First Llama models using MoE for incredible efficiency - 📏 Super Long Context - Up to 10M tokens - 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each) 🔹 Llama 4 Scout - 17B active parameters (109B total) - 16 experts architecture - 10M context window - Fits on a single H100 GPU - Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 🔹 Llama 4 Maverick - 17B active parameters (400B total) - 128 experts architecture - It can fit perfectly on DGX H100(8x H100) - 1M context window - Outperforms GPT-4o and Gemini 2.0 Flash - ELO score of 1417 on LMArena currently second best model on arena 🔹 Llama 4 Behemoth (Coming Soon) - 288B active parameters (2T total) - 16 experts architecture - Teacher model for Scout and Maverick - Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks

reacted to clem's post with 🔥 11 days ago

Llama models (arguably the most successful open AI models of all times) just represented 3% of total model downloads on Hugging Face in March. People and media like stories of winner takes all & one model/company to rule them all but the reality is much more nuanced than this! Kudos to all the small AI builders out there!

View all activity

Organizations

FiditeNemini's activity

liked a model 8 days ago

HiDream-ai/HiDream-I1-Full

Text-to-Image • Updated 3 days ago • 16k • • 469

reacted to merterbak's post with 🔥 10 days ago

Post

2943

Meta has unveiled its Llama 4 🦙 family of models, featuring native multimodality and mixture-of-experts architecture. Two model families are available now:
Models🤗: meta-llama/llama-4-67f0c30d9fe03840bc9d0164
Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
HF's Blog Post: https://huggingface.co/blog/llama4-release

- 🧠 Native Multimodality - Process text and images in a unified architecture
- 🔍 Mixture-of-Experts - First Llama models using MoE for incredible efficiency
- 📏 Super Long Context - Up to 10M tokens
- 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each)

🔹 Llama 4 Scout
- 17B active parameters (109B total)
- 16 experts architecture
- 10M context window
- Fits on a single H100 GPU
- Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1

🔹 Llama 4 Maverick
- 17B active parameters (400B total)
- 128 experts architecture
- It can fit perfectly on DGX H100(8x H100)
- 1M context window
- Outperforms GPT-4o and Gemini 2.0 Flash
- ELO score of 1417 on LMArena currently second best model on arena

🔹 Llama 4 Behemoth (Coming Soon)
- 288B active parameters (2T total)
- 16 experts architecture
- Teacher model for Scout and Maverick
- Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks

reacted to clem's post with 🔥 11 days ago

Post

1923

Llama models (arguably the most successful open AI models of all times) just represented 3% of total model downloads on Hugging Face in March.

People and media like stories of winner takes all & one model/company to rule them all but the reality is much more nuanced than this!

Kudos to all the small AI builders out there!

2 replies

liked a model 23 days ago

SicariusSicariiStuff/X-Ray_Alpha

Updated 18 days ago • 366 • 58

liked a dataset 26 days ago

mlx-community/Human-Like-DPO

Viewer • Updated 26 days ago • 972 • 225 • 2

reacted to mlabonne's post with 🔥 27 days ago

Post

8514

✂️ AutoAbliteration

I made a Colab notebook to automatically abliterate models.

It's quite general, so you can do interesting stuff like blocking a given language in the model outputs.

💻 Colab: https://colab.research.google.com/drive/1RmLv-pCMBBsQGXQIM8yF-OdCNyoylUR1?usp=sharing

reacted to AdinaY's post with 👀 28 days ago

Post

2098

Skywork-R1V🚀 38B open multimodal reasoning model with advanced visual CoT capabilities, released by Skywork.

Skywork/Skywork-R1V-38B

✨ Visual Reasoning: Breaks down complex images step by step.
✨ Math & Science: Solves visual problems with high precision.
✨ Combines text & images for deeper understanding.

liked 2 models about 1 month ago

huihui-ai/DeepSeek-V3-abliterated

Text Generation • Updated 10 days ago • 90 • 109

huihui-ai/QwQ-32B-abliterated

Text Generation • Updated Mar 12 • 2k • 81

New activity in TheDrummer/Fallen-Llama-3.3-R1-70B-v1-GGUF about 1 month ago

Wrong gguf's in repo?

#1 opened about 2 months ago by

FiditeNemini

reacted to luigi12345's post with 👍 about 2 months ago

Post

1781

✅BEST DEBUG PROMPT
Language: Any. 🌀Project Type: Any

What prompt if sent to you  will make you  detect and fix all the code crashing  issues in the COMPLETE codebase so I don't have to ask you to fix again and again ?
Step 1. Gimme such prompt
Step 2 Follow it yourself quietly and COMPLETELY
Step3: State  that if you are asked again about finding fatal bugs and logic issues and inconsistencies in the current codebase you would not be able  to  find more. (You cannot lie, so you must make all the code adjustments necessary prior to  such statement).

reacted to grimjim's post with 👍 2 months ago

Post

2265

This recent paper points to an explanation for the unreasonable effectiveness of Frankenmerges: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (2502.05171)

Specifically, the duplication of layers in Frankenmerges serves a purpose similar to what occurs in their recurrent-depth architecture. Successful frankenmerges that operate without additional fine-tuning are able to recover or "heal" from any damage due to abrupt transitions between layer blocks. Operational replicated layer blocks can provide functional benefits grounded in latent reasoning. Frankenmerges can also result in hybrid reasoning, by splicing together the latent reasoning of different models.

Back in April 2024, I was able to duplicate a few layers in the Llama 3 8B model, turning it into a 9B model, without harming benchmarks significantly, despite any transition damage.
grimjim/llama-3-experiment-v1-9B
My informal experimentation suggested that latent reasoning circuits could occupy continguous stacks of 2-4 layers, though the result was highly sensitive to the choice of transition location between layers.

1 reply

liked a dataset 2 months ago

cognitivecomputations/dolphin-r1

Viewer • Updated Jan 30 • 814k • 1.49k • 277

updated a model 3 months ago

FiditeNemini/Unhinged-Author-70B

Text Generation • Updated Jan 29 • 50 • 3

published a model 3 months ago

FiditeNemini/Unhinged-Author-70B

Text Generation • Updated Jan 29 • 50 • 3

updated a model 3 months ago

FiditeNemini/Unhinged-Qwen2.5-R1-1M-Uncensored-BF16

Updated Jan 28 • 24 • 3

published a model 3 months ago

FiditeNemini/Unhinged-Qwen2.5-R1-1M-Uncensored-BF16

Updated Jan 28 • 24 • 3

reacted to mkurman's post with 🔥 3 months ago

Post

1919

I’ve simplified things for the AI OS community!

Check out Qwen-2.5-14B-DeepSeek-R1-1M! This one's a cool blend of the latest Qwen 2.5 with 14 billion parameters and has a massive 1 million token context window. It also comes with the DeepSeek R1 version of the Qwen 2.5 14B base model.

Enjoy! 🚀

mkurman/Qwen2.5-14B-DeepSeek-R1-1M

New activity in huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2 3 months ago

1M token context version is out

#2 opened 3 months ago by

FiditeNemini

liked a model 3 months ago

huihui-ai/DeepSeek-R1-Distill-Llama-70B-abliterated

Text Generation • Updated Feb 16 • 10.4k • 80