π§ AgentRank-Base
The First Embedding Model Built Specifically for AI Agent Memory Retrieval
+23% MRR improvement over general-purpose embedders | Temporal awareness | Memory type understanding
π Quick Start β’ π Benchmarks β’ π§ Architecture β’ π‘ Why AgentRank?
π― TL;DR
AgentRank-Base is an embedding model designed for AI agents that need to remember. Unlike generic embedders (OpenAI, Cohere, MiniLM), AgentRank understands:
- β° When something happened (temporal awareness)
- π What type of memory it is (episodic vs semantic vs procedural)
- β How important the memory is
π‘ Why AgentRank?
The Problem with Current Embedders
AI agents need memory. But when you ask an agent:
"What did we discuss about Python yesterday?"
Current embedders fail because they:
- β Don't understand "yesterday" means recent time
- β Can't distinguish between a preference and an event
- β Treat all memories as equally important
The AgentRank Solution
| Challenge | OpenAI/Cohere/MiniLM | AgentRank |
|---|---|---|
| "What did I say yesterday?" | Random old results π | Recent memories first β |
| "What's my preference?" | Mixed with events π | Only preferences β |
| "What's most important?" | No priority π | Importance-aware retrieval β |
π Benchmarks
Evaluated on AgentMemBench (500 test samples, 8 candidates each):
| Model | Parameters | MRR β | Recall@1 β | Recall@5 β | NDCG@10 β |
|---|---|---|---|---|---|
| AgentRank-Base | 149M | 0.6496 | 0.4440 | 0.9960 | 0.6786 |
| AgentRank-Small | 33M | 0.6375 | 0.4460 | 0.9740 | 0.6797 |
| all-mpnet-base-v2 | 109M | 0.5351 | 0.3660 | 0.7960 | 0.6335 |
| all-MiniLM-L6-v2 | 22M | 0.5297 | 0.3720 | 0.7520 | 0.6370 |
Improvement Over Baselines
| vs Baseline | MRR | Recall@1 | Recall@5 |
|---|---|---|---|
| vs MiniLM | +22.6% | +19.4% | +32.4% |
| vs MPNet | +21.4% | +21.3% | +25.1% |
π Quick Start
Installation
pip install transformers torch
Basic Usage
from transformers import AutoModel, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModel.from_pretrained("vrushket/agentrank-base")
tokenizer = AutoTokenizer.from_pretrained("vrushket/agentrank-base")
def encode(texts, model, tokenizer):
"""Encode texts to embeddings."""
inputs = tokenizer(
texts,
padding=True,
truncation=True,
max_length=512,
return_tensors="pt"
)
with torch.no_grad():
outputs = model(**inputs)
# Mean pooling
embeddings = outputs.last_hidden_state.mean(dim=1)
# L2 normalize
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
return embeddings
# Your agent's memories
memories = [
"User prefers Python over JavaScript for backend development",
"User asked about React frameworks yesterday",
"User mentioned they have 3 years of coding experience",
"User is working on an e-commerce project",
]
# A query from the user
query = "What programming language does the user prefer?"
# Encode everything
memory_embeddings = encode(memories, model, tokenizer)
query_embedding = encode([query], model, tokenizer)
# Find most similar memory
similarities = torch.mm(query_embedding, memory_embeddings.T)[0]
best_match_idx = similarities.argmax().item()
print(f"Query: {query}")
print(f"Best match: {memories[best_match_idx]}")
print(f"Similarity: {similarities[best_match_idx]:.4f}")
# Output:
# Query: What programming language does the user prefer?
# Best match: User prefers Python over JavaScript for backend development
# Similarity: 0.8234
Advanced Usage with Metadata
For full temporal and memory type awareness, use the AgentRank package:
# Coming soon: pip install agentrank
from agentrank import AgentRankEmbedder
model = AgentRankEmbedder.from_pretrained("vrushket/agentrank-base")
# Encode with temporal context
memory_embedding = model.encode(
text="User mentioned they prefer morning meetings",
days_ago=7, # Memory is 1 week old
memory_type="semantic" # It's a preference (not an event)
)
# Encode query (no metadata needed for queries)
query_embedding = model.encode("When does the user like to have meetings?")
# The model now knows this is a week-old preference!
similarity = torch.cosine_similarity(query_embedding, memory_embedding, dim=0)
π§ Architecture
AgentRank-Base is built on ModernBERT-base (110M params) with novel additions:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β ModernBERT Encoder (22 Transformer Layers) β
β - RoPE Positional Encoding β
β - Flash Attention β
β - 768 Hidden Dimension β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Temporal β β Memory β β Importance β
β Position β β Type β β Prediction β
β Embeddings β β Embeddings β β Head β
β (10 Γ 768) β β (4 Γ 768) β β (768β1) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β β β
βββββββββββββββββΌββββββββββββββββ
β
βββββββββββββββββββββββ
β Projection Layer β
β (768 β 768) β
βββββββββββββββββββββββ
β
βββββββββββββββββββββββ
β L2 Normalization β
β 768-dim Embedding β
βββββββββββββββββββββββ
Novel Components
| Component | Purpose | How It Helps |
|---|---|---|
| Temporal Embeddings | Encodes memory age (today, this week, last month, etc.) | "Yesterday" queries match recent memories |
| Memory Type Embeddings | Distinguishes episodic/semantic/procedural | "What do I like?" matches preferences, not events |
| Importance Head | Auxiliary task predicting memory priority | Helps learn better representations |
Temporal Buckets
Bucket 0: Today (0-1 days)
Bucket 1: Recent (1-3 days)
Bucket 2: This week (3-7 days)
Bucket 3: Last week (7-14 days)
Bucket 4: This month (14-30 days)
Bucket 5: Last month (30-60 days)
Bucket 6: Few months (60-90 days)
Bucket 7: Half year (90-180 days)
Bucket 8: This year (180-365 days)
Bucket 9: Long ago (365+ days)
Memory Types
Type 0: Episodic β Events, conversations ("We discussed X yesterday")
Type 1: Semantic β Facts, preferences ("User likes Python")
Type 2: Procedural β Instructions ("To deploy, run npm build")
Type 3: Unknown β Fallback
π Training Details
| Aspect | Details |
|---|---|
| Base Model | answerdotai/ModernBERT-base (110M params) |
| Training Data | 500K synthetic agent memory samples |
| Memory Distribution | Episodic (40%), Semantic (35%), Procedural (25%) |
| Loss Function | Multiple Negatives Ranking Loss + Importance MSE |
| Hard Negatives | 7 per sample (5 types: temporal, type confusion, topic drift, etc.) |
| Batch Size | 16-32 per GPU |
| Hardware | 2Γ NVIDIA RTX 6000 Ada (48GB each) |
| Training Time | ~12 hours |
| Precision | FP16 Mixed Precision |
| Final Val Loss | 0.877 |
ποΈ Use Cases
1. AI Agents with Long-Term Memory
# Store memories with metadata
agent.remember(
text="User is allergic to peanuts",
memory_type="semantic",
importance=10, # Critical medical info!
)
# Later, when discussing food...
relevant_memories = agent.recall("What should I know about the user's diet?")
# Returns: "User is allergic to peanuts" (even if stored months ago)
2. RAG Systems for Conversational AI
# Better retrieval for chatbots
query = "What did we talk about in our last meeting?"
# AgentRank returns recent, relevant conversations
# Generic embedders return random topically-similar docs
3. Personal Knowledge Bases
# User's notes and preferences
memories = [
"I prefer dark mode in all apps",
"My morning routine starts at 6 AM",
"Important: Tax deadline April 15",
]
# AgentRank properly handles time-sensitive queries
π When to Use AgentRank vs Others
| Use Case | Best Model |
|---|---|
| AI agents with memory | β AgentRank |
| Time-sensitive retrieval | β AgentRank |
| Conversational AI | β AgentRank |
| General document search | OpenAI / Cohere |
| Code search | CodeBERT |
| Scientific papers | SciBERT |
π Model Family
| Model | Parameters | Speed | Quality | Best For |
|---|---|---|---|---|
| agentrank-small | 33M | β‘β‘β‘ Fast | Good | Real-time agents, edge |
| agentrank-base | 149M | β‘β‘ Medium | Best | Quality-critical apps |
| agentrank-reranker (coming) | 149M | β‘ Slower | Superior | Two-stage retrieval |
π Citation
@misc{agentrank2024,
author = {Vrushket More},
title = {AgentRank: Embedding Models for AI Agent Memory Retrieval},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/vrushket/agentrank-base}
}
π€ Community & Support
- π Issues: GitHub Issues
- π¬ Discussions: HuggingFace Community
- π§ Contact: [email protected]
π License
Apache 2.0 - Free for commercial use!
- Downloads last month
- 17
Evaluation results
- MRRself-reported0.650
- Recall@1self-reported0.444
- Recall@5self-reported0.996
- NDCG@10self-reported0.679