AI & ML interests

None defined yet.

Recent Activity

tomaarsenΒ 
posted an update 4 days ago
view post
Post
2103
‼️Sentence Transformers v5.0 is out! The biggest update yet introduces Sparse Embedding models, encode methods improvements, Router module for asymmetric models & much more. Sparse + Dense = πŸ”₯ hybrid search performance! Details:

1️⃣ Sparse Encoder Models
Brand new support for sparse embedding models that generate high-dimensional embeddings (30,000+ dims) where <1% are non-zero:

- Full SPLADE, Inference-free SPLADE, and CSR architecture support
- 4 new modules, 12 new losses, 9 new evaluators
- Integration with @elastic-co , @opensearch-project , @NAVER LABS Europe, @qdrant , @IBM , etc.
- Decode interpretable embeddings to understand token importance
- Hybrid search integration to get the best of both worlds

2️⃣ Enhanced Encode Methods & Multi-Processing
- Introduce encode_query & encode_document automatically use predefined prompts
- No more manual pool management - just pass device list directly to encode()
- Much cleaner and easier to use than the old multi-process approach

3️⃣ Router Module & Advanced Training
- Router module with different processing paths for queries vs documents
- Custom learning rates for different parameter groups
- Composite loss logging - see individual loss components
- Perfect for two-tower architectures

4️⃣ Comprehensive Documentation & Training
- New Training Overview, Loss Overview, API Reference docs
- 6 new training example documentation pages
- Full integration examples with major search engines
- Extensive blogpost on training sparse models

Read the comprehensive blogpost about training sparse embedding models: https://huggingface.co/blog/train-sparse-encoder

See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/v5.0.0

What's next? We would love to hear from the community! What sparse encoder models would you like to see? And what new capabilities should Sentence Transformers handle - multimodal embeddings, late interaction models, or something else? Your feedback shapes our roadmap!
a-r-r-o-wΒ 
posted an update 8 days ago
view post
Post
2749
As you might have already heard, FLUX.1-Kontext-dev is now released and taken the generative community by storm!

In case you haven't come across it, you can get started with Kontext using πŸ€— diffusers. See the official [model]( black-forest-labs/FLUX.1-Kontext-dev) and [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#flux).

Want to know how inference companies like Fal & Replicate are able to run the model so fast and in under 2 seconds per image? Check out this [gist](https://gist.github.com/a-r-r-o-w/d08c37e8bd3e9c26b4ce80360be148c6) for some details!
  • 1 reply
Β·
albertvillanovaΒ 
posted an update 11 days ago
view post
Post
1492
πŸš€ SmolAgents v1.19.0 is live!
This release brings major improvements to agent flexibility, UI usability, streaming architecture, and developer experience: making it easier than ever to build smart, interactive AI agents. Here's what's new:

πŸ”§ Agent Upgrades
- Support for managed agents in ToolCallingAgent
- Context manager support for cleaner agent lifecycle handling
- Output formatting now uses XML tags for consistency

πŸ–₯️ UI Enhancements
- GradioUI now supports reset_agent_memory: perfect for fresh starts in dev & demos.

πŸ”„ Streaming Refactor
- Streaming event aggregation moved off the Model class
- ➑️ Better architecture & maintainability

πŸ“¦ Output Tracking
- CodeAgent outputs are now stored in ActionStep
- βœ… More visibility and structure to agent decisions

πŸ› Bug Fixes
- Smarter planning logic
- Cleaner Docker logs
- Better prompt formatting for additional_args
- Safer internal functions and final answer matching

πŸ“š Docs Improvements
- Added quickstart examples with tool usage
- One-click Colab launch buttons
- Expanded reference docs (AgentMemory, GradioUI docstrings)
- Fixed broken links and migrated to .md format

πŸ”— Full release notes:
https://github.com/huggingface/smolagents/releases/tag/v1.19.0

πŸ’¬ Try it out, explore the new features, and let us know what you build!

#smolagents #opensource #AIagents #LLM #HuggingFace
a-r-r-o-wΒ 
posted an update 21 days ago
view post
Post
2265
New diffusion model for text-to-image and video-to-world generation: Cosmos Predict-2 πŸ‘½

Model collection: nvidia/cosmos-predict2-68028efc052239369a0f2959
Diffusers support: https://github.com/huggingface/diffusers/pull/11695
Documentation: https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos

These are results with the 2B param model. Imagine what you could do with the 14B version! Go check it out now!
  • 1 reply
Β·
a-r-r-o-wΒ 
posted an update 23 days ago
view post
Post
1303
Did you know how simple it was to get started with your own custom compiler backend with torch.compile? What's stopping you from writing your own compiler?

import torch
from torch._functorch.partitioners import draw_graph

def compiler(fx_module: torch.fx.GraphModule, _):
    draw_graph(fx_module, f"compile.dot")
    return fx_module.forward

def capture(model, *inputs):
    compiled_model = torch.compile(model, backend=compiler)
    y = compiled_model(*inputs)
    y.sum().backward()

class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        
        self.linear_1 = torch.nn.Linear(16, 32)
        self.linear_2 = torch.nn.Linear(32, 16)
    
    def forward(self, x):
        x = self.linear_1(x)
        x = torch.nn.functional.silu(x)
        x = self.linear_2(x)
        return x

if __name__ == '__main__':
    model = MLP()
    model.to("mps")
    x = torch.randn(4, 16, device="mps", dtype=torch.float32)

    capture(model, x)


--------------

Part of https://huggingface.co/posts/a-r-r-o-w/231008365980283
  • 1 reply
Β·
a-r-r-o-wΒ 
posted an update 23 days ago
view post
Post
2232
Recently, I've been focusing my learning on the following topics:
- Pytorch internals, specifically the inductor system (roughly ~1 month of experience)
- Triton internals (~8 moe)
- CUDA (~3 moe)
- Understanding fusion patterns in compilers and how to improve them (~1 moe)
- Parallelism strategies for large scale inference optimization (~6-7 moe)

I thought it would be nice to document it somewhere for no particular reason. Maybe someone will find it useful? It's also because I want to get into the habit of writing, but had no motivation to do so. Maybe writing short informal posts will help build the habit.

Since I don't have a personal site, and don't plan to create one in the near future, I think HF posts are best suited for short and informal documentation to share my little discoveries and learnings. If you're interested, strap in!

First post in this series will be on basic study of Pytorch's float32 matmuls and their Triton implementation (nothing much, just the tutorial available on the website), short dive into TF32 and their TFLOPS comparison on an A100 machine.
Β·
NarsilΒ 
posted an update 24 days ago
view post
Post
1492
Me: This function is too slow. Find a faster algorithm.
Cursor: Hold my beer.

Me: *Slacking off with colleagues*
Cursor: Ping.

Me: 🀯

XenovaΒ 
posted an update about 1 month ago
view post
Post
5009
NEW: Real-time conversational AI models can now run 100% locally in your browser! 🀯

πŸ” Privacy by design (no data leaves your device)
πŸ’° Completely free... forever
πŸ“¦ Zero installation required, just visit a website
⚑️ Blazingly-fast WebGPU-accelerated inference

Try it out: webml-community/conversational-webgpu

For those interested, here's how it works:
- Silero VAD for voice activity detection
- Whisper for speech recognition
- SmolLM2-1.7B for text generation
- Kokoro for text to speech

Powered by Transformers.js and ONNX Runtime Web! πŸ€— I hope you like it!
Β·
danaaubakirovaΒ 
posted an update about 1 month ago
albertvillanovaΒ 
posted an update about 1 month ago
sayakpaulΒ 
posted an update about 1 month ago
view post
Post
2632
Diffusers supports a good variety of quantization backends. It can be challenging to navigate through them, given the complex nature of diffusion pipelines in general.

So, @derekl35 set out to write a comprehensive guide that puts users in the front seat. Explore the different backends we support, learn the trade-offs they offer, and finally, check out the cool space we built that lets you compare quantization results.

Give it a go here:
https://lnkd.in/gf8Pi4-2
joaoganteΒ 
posted an update about 1 month ago
view post
Post
491
Let's go! Custom generation code has landed in transformers πŸš€

Have you designed a new cool KV cache? Maybe you're comparing new test-time compute ideas you've been researching? Have you found a way to do diffusion with existing models? You can now easily share your findings with the community with custom generation code, sharing the well-known generate interface πŸ€“

In a nutshell, we have expanded the support of custom modeling code on the Hub with *model-agnostic* custom generation code. Write for one model, reuse with any model -- hopefully, this will democratize access to new generation ideas 🫑

As a creator, you gain the ability to get your ideas in transformers with minimal effort. You'll also have access to all Hub features: a landing page for your creation, discussions, usage metrics, ... πŸ€“

πŸ’Ž Resources πŸ’Ž
- docs: https://huggingface.co/docs/transformers/generation_strategies#custom-decoding-methods
- minimal example: transformers-community/custom_generate_example
- discussion: transformers-community/support#10
sayakpaulΒ 
posted an update about 2 months ago
view post
Post
1717
Despite the emergence of combining LLM and DiT architectures for T2I synthesis, its design remains severely understudied.

This was done long ago and got into CVPR25 -- super excited to finally share it now, along with the data and code β™₯️

We explore several architectural choices that affect this design. We provide an open & reproducible training recipe that works at scale.

Works like Playground v3 have already explored a deep fusion between an LLM and a DiT, sharing their representations through layerwise attention. They exhibit excellent performance on T2I.

Despite its compelling results and other performance virtues, it remains unexplored, which is what we want to improve in our work. Specifically, we take a pre-trained LLM (Gemma-2B) and trainable DiT, and set out to explore what makes a "good deep fusion" between the two for T2I.

We explore several key questions in the work, such as:

Q1: How should we do attention? We considered several alternatives. PixArt-Alpha like attention (cross-attention) is very promising.
Q2: Should we incorporate additional text modulation?
Q3: Can we eliminate timestep conditioning?
Q4: How do we do positional encodings?
Q5: Do instruction-tuned LLMs help deep fusion?
Q6: Would using a decoder LLM from a multimodal model be helpful?
Q7: Does using a better variant of Gemma help?

Based on the above findings, we arrive at FuseDiT with the following components on top of the base architecture from the findings of our experiments.

* No AdaLN-Zero modules
* 1D + 2D-RoPE
* Gemma 2 2B, adjusting DiT configurations accordingly

We trained FuseDiT on a mixture from CC12M, JourneyDB, & SA (~26M image-text pairs) for 800 steps. While not the best model, it's encouraging to develop something in a guided manner using open datasets.

To know more (code, models, all are available), please check out the paper:
https://lnkd.in/gg6qyqZX.
albertvillanovaΒ 
posted an update about 2 months ago
view post
Post
2493
New in smolagents v1.16.0:
πŸ” Bing support in WebSearchTool
🐍 Custom functions & executor_kwargs in LocalPythonExecutor
πŸ”§ Streaming GradioUI fixes
🌐 Local web agents via api_base & api_key
πŸ“š Better docs

πŸ‘‰ https://github.com/huggingface/smolagents/releases/tag/v1.16.0