nicolo's picture

nicolo

nicolollo
Β·

AI & ML interests

None yet

Recent Activity

Organizations

Hugging Face Discord Community's profile picture

nicolollo's activity

reacted to nicolay-r's post with πŸ”₯ 17 days ago
view post
Post
2647
πŸš€ Delighted to share a major milestone in adapting reasoning techniques for data collections augmentation!
Introducing bulk-chain 1.0.0 -- the first major release of a no-string API for adapting your LLM for Chain-of-Thought alike reasoning over records with large amount of parameters across large datasets.

⭐ Check it out: https://github.com/nicolay-r/bulk-chain

What’s new and why it matters:
πŸ“¦ Fully no-string API for easy client deployment
πŸ”₯ Demos are now standalone projects:

Demos:
πŸ“Ί bash / shell (dispatched): https://github.com/nicolay-r/bulk-chain-shell
πŸ“Ί tksheet: https://github.com/nicolay-r/bulk-chain-tksheet-client

Using nlp-thirdgate to host the supported providers:
🌌 LLM providers: https://github.com/nicolay-r/nlp-thirdgate
reacted to grimjim's post with ❀️ 3 months ago
view post
Post
2367
This recent paper points to an explanation for the unreasonable effectiveness of Frankenmerges: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (2502.05171)

Specifically, the duplication of layers in Frankenmerges serves a purpose similar to what occurs in their recurrent-depth architecture. Successful frankenmerges that operate without additional fine-tuning are able to recover or "heal" from any damage due to abrupt transitions between layer blocks. Operational replicated layer blocks can provide functional benefits grounded in latent reasoning. Frankenmerges can also result in hybrid reasoning, by splicing together the latent reasoning of different models.

Back in April 2024, I was able to duplicate a few layers in the Llama 3 8B model, turning it into a 9B model, without harming benchmarks significantly, despite any transition damage.
grimjim/llama-3-experiment-v1-9B
My informal experimentation suggested that latent reasoning circuits could occupy continguous stacks of 2-4 layers, though the result was highly sensitive to the choice of transition location between layers.
  • 1 reply
Β·
updated a model 4 months ago
published a model 4 months ago
reacted to merve's post with β€οΈπŸš€ 4 months ago
view post
Post
4917
supercharge your LLM apps with smolagents πŸ”₯

however cool your LLM is, without being agentic it can only go so far

enter smolagents: a new agent library by Hugging Face to make the LLM write code, do analysis and automate boring stuff!

Here's our blog for you to get started https://huggingface.co/blog/smolagents