Tom Aarsen's picture

Building on HF

Tom Aarsen

tomaarsen

huggingface

·

https://linkedin.com/in/tomaarsen

AI & ML interests

NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification

Recent Activity

upvoted a collection about 6 hours ago

ClaimExtractor-2605

liked a model about 23 hours ago

Qdrant/splade-ecommerce-esci

liked a model about 23 hours ago

Qdrant/splade-ecommerce-multidomain

View all activity

Organizations

upvoted a collection about 6 hours ago

ClaimExtractor-2605

Extract claims and intents from conversations • 6 items • Updated 1 day ago • 7

upvoted a collection 4 days ago

Verbatim RAG v1

Hallucination free RAG and out SOTA state-of-the-art extractors • 8 items • Updated 4 days ago • 9

upvoted a changelog 5 days ago

Hugging Face Changelog

Filter Models page by Base Models only

9 days ago

• 122

upvoted an article 5 days ago

Article

Introduction to Trimming ✂

lbourdois

•

9 days ago

• 39

upvoted a collection 15 days ago

Foundation Text-Generation Models Below 360M Parameters

Great candidates for fine-tuning targeting Wllama and Transformers.js for mobile devices, ordered by number of parameters. • 43 items • Updated 15 days ago • 46

upvoted a paper 15 days ago

Adaptive Chunking: Optimizing Chunking-Method Selection for RAG

Paper • 2603.25333 • Published Mar 26 • 4

upvoted a paper 16 days ago

(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models

Paper • 2604.16429 • Published 25 days ago • 2

upvoted a collection 16 days ago

DeAR-Reranking

DeAR (Deep Agent Rank): Dual-Stage Document Reranking with Reasoning Agents Accepted at EMNLP Findings 2025 • 12 items • Updated Oct 21, 2025 • 2

upvoted a collection 17 days ago

Command A Plus

4 items • Updated 17 days ago • 43

upvoted an article 17 days ago

Article

OlmoEarth v1.1: A more efficient family of Earth observation models

allenai

•

18 days ago

• 22

upvoted an article 18 days ago

Article

Introducing the Ettin Reranker Family

tomaarsen

•

19 days ago

• 51

upvoted a collection 18 days ago

Ettin Rerankers

8 items • Updated 18 days ago • 8

upvoted an article 18 days ago

Article

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

ibm-granite

•

23 days ago

• 32

upvoted a paper 20 days ago

Precise Zero-Shot Dense Retrieval without Relevance Labels

Paper • 2212.10496 • Published Dec 20, 2022 • 6

upvoted a paper 21 days ago

BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation

Paper • 2508.06781 • Published Aug 9, 2025 • 1

upvoted an article 23 days ago

Article

Unlocking asynchronicity in continuous batching

+1

ror, pcuenq, ariG23498

•

24 days ago

• 58

upvoted an article 24 days ago

Article

SSE Retrieval MRL v2: Regularization of Representation Space and Performance Improvement via Hyperparameter Optimization

RikkaBotan

•

24 days ago

• 2

upvoted a paper 24 days ago

Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?

Paper • 2605.10848 • Published 27 days ago • 5

upvoted a collection 24 days ago

Biomedical datasets & models

8 items • Updated 19 days ago • 6

upvoted a paper 24 days ago

A Causal Language Modeling Detour Improves Encoder Continued Pretraining

Paper • 2605.12438 • Published 26 days ago • 7