Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
36.1
TFLOPS
28
10
59
nicolo
nicolollo
Follow
21world's profile picture
1 follower
·
15 following
AI & ML interests
None yet
Recent Activity
liked
a model
5 days ago
nomic-ai/nomic-embed-text-v2-moe
reacted
to
grimjim
's
post
with ❤️
7 days ago
This recent paper points to an explanation for the unreasonable effectiveness of Frankenmerges: https://huggingface.co/papers/2502.05171 Specifically, the duplication of layers in Frankenmerges serves a purpose similar to what occurs in their recurrent-depth architecture. Successful frankenmerges that operate without additional fine-tuning are able to recover or "heal" from any damage due to abrupt transitions between layer blocks. Operational replicated layer blocks can provide functional benefits grounded in latent reasoning. Frankenmerges can also result in hybrid reasoning, by splicing together the latent reasoning of different models. Back in April 2024, I was able to duplicate a few layers in the Llama 3 8B model, turning it into a 9B model, without harming benchmarks significantly, despite any transition damage. https://huggingface.co/grimjim/llama-3-experiment-v1-9B My informal experimentation suggested that latent reasoning circuits could occupy continguous stacks of 2-4 layers, though the result was highly sensitive to the choice of transition location between layers.
liked
a model
7 days ago
tomg-group-umd/huginn-0125
View all activity
Organizations
models
4
Sort: Recently updated
nicolollo/test1-Q4_K_M-GGUF
Updated
27 days ago
•
33
nicolollo/test1
Updated
27 days ago
•
13
nicolollo/test-Q4_K_M-GGUF
Updated
Dec 28, 2024
•
9
nicolollo/test
Updated
Dec 28, 2024
•
5
datasets
2
Sort: Recently updated
nicolollo/my-distiset
Viewer
•
Updated
Sep 16, 2024
•
1
•
82
nicolollo/docci
Viewer
•
Updated
Jul 11, 2024
•
14.7k
•
214