🧠 AgentRank-Base

The First Embedding Model Built Specifically for AI Agent Memory Retrieval

+23% MRR improvement over general-purpose embedders | Temporal awareness | Memory type understanding

🚀 Quick Start • 📊 Benchmarks • 🔧 Architecture • 💡 Why AgentRank?

🎯 TL;DR

AgentRank-Base is an embedding model designed for AI agents that need to remember. Unlike generic embedders (OpenAI, Cohere, MiniLM), AgentRank understands:

⏰ When something happened (temporal awareness)

📁 What type of memory it is (episodic vs semantic vs procedural)

⭐ How important the memory is

💡 Why AgentRank?

The Problem with Current Embedders

AI agents need memory. But when you ask an agent:

"What did we discuss about Python yesterday?"

Current embedders fail because they:

❌ Don't understand "yesterday" means recent time
❌ Can't distinguish between a preference and an event
❌ Treat all memories as equally important

The AgentRank Solution

Challenge	OpenAI/Cohere/MiniLM	AgentRank
"What did I say yesterday?"	Random old results 😕	Recent memories first ✅
"What's my preference?"	Mixed with events 😕	Only preferences ✅
"What's most important?"	No priority 😕	Importance-aware retrieval ✅

📊 Benchmarks

Evaluated on AgentMemBench (500 test samples, 8 candidates each):

Model	Parameters	MRR ↑	Recall@1 ↑	Recall@5 ↑	NDCG@10 ↑
AgentRank-Base	149M	0.6496	0.4440	0.9960	0.6786
AgentRank-Small	33M	0.6375	0.4460	0.9740	0.6797
all-mpnet-base-v2	109M	0.5351	0.3660	0.7960	0.6335
all-MiniLM-L6-v2	22M	0.5297	0.3720	0.7520	0.6370

Improvement Over Baselines

vs Baseline	MRR	Recall@1	Recall@5
vs MiniLM	+22.6%	+19.4%	+32.4%
vs MPNet	+21.4%	+21.3%	+25.1%

🚀 Quick Start

Installation

pip install transformers torch

Basic Usage

from transformers import AutoModel, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModel.from_pretrained("vrushket/agentrank-base")
tokenizer = AutoTokenizer.from_pretrained("vrushket/agentrank-base")

def encode(texts, model, tokenizer):
    """Encode texts to embeddings."""
    inputs = tokenizer(
        texts, 
        padding=True, 
        truncation=True, 
        max_length=512,
        return_tensors="pt"
    )
    with torch.no_grad():
        outputs = model(**inputs)
        # Mean pooling
        embeddings = outputs.last_hidden_state.mean(dim=1)
        # L2 normalize
        embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
    return embeddings

# Your agent's memories
memories = [
    "User prefers Python over JavaScript for backend development",
    "User asked about React frameworks yesterday",
    "User mentioned they have 3 years of coding experience",
    "User is working on an e-commerce project",
]

# A query from the user
query = "What programming language does the user prefer?"

# Encode everything
memory_embeddings = encode(memories, model, tokenizer)
query_embedding = encode([query], model, tokenizer)

# Find most similar memory
similarities = torch.mm(query_embedding, memory_embeddings.T)[0]
best_match_idx = similarities.argmax().item()

print(f"Query: {query}")
print(f"Best match: {memories[best_match_idx]}")
print(f"Similarity: {similarities[best_match_idx]:.4f}")

# Output:
# Query: What programming language does the user prefer?
# Best match: User prefers Python over JavaScript for backend development
# Similarity: 0.8234

Advanced Usage with Metadata

For full temporal and memory type awareness, use the AgentRank package:

# Coming soon: pip install agentrank
from agentrank import AgentRankEmbedder

model = AgentRankEmbedder.from_pretrained("vrushket/agentrank-base")

# Encode with temporal context
memory_embedding = model.encode(
    text="User mentioned they prefer morning meetings",
    days_ago=7,           # Memory is 1 week old
    memory_type="semantic" # It's a preference (not an event)
)

# Encode query (no metadata needed for queries)
query_embedding = model.encode("When does the user like to have meetings?")

# The model now knows this is a week-old preference!
similarity = torch.cosine_similarity(query_embedding, memory_embedding, dim=0)

🔧 Architecture

AgentRank-Base is built on ModernBERT-base (110M params) with novel additions:

┌─────────────────────────────────────────────────┐
│     ModernBERT Encoder (22 Transformer Layers)  │
│     - RoPE Positional Encoding                  │
│     - Flash Attention                           │
│     - 768 Hidden Dimension                      │
└─────────────────────────────────────────────────┘
                       │
       ┌───────────────┼───────────────┐
       ↓               ↓               ↓
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│  Temporal   │ │  Memory     │ │ Importance  │
│  Position   │ │  Type       │ │ Prediction  │
│  Embeddings │ │  Embeddings │ │ Head        │
│  (10 × 768) │ │  (4 × 768)  │ │ (768→1)     │
└─────────────┘ └─────────────┘ └─────────────┘
       │               │               │
       └───────────────┼───────────────┘
                       ↓
          ┌─────────────────────┐
          │  Projection Layer   │
          │  (768 → 768)        │
          └─────────────────────┘
                       ↓
          ┌─────────────────────┐
          │  L2 Normalization   │
          │  768-dim Embedding  │
          └─────────────────────┘

Novel Components

Component	Purpose	How It Helps
Temporal Embeddings	Encodes memory age (today, this week, last month, etc.)	"Yesterday" queries match recent memories
Memory Type Embeddings	Distinguishes episodic/semantic/procedural	"What do I like?" matches preferences, not events
Importance Head	Auxiliary task predicting memory priority	Helps learn better representations

Temporal Buckets

Bucket 0: Today (0-1 days)
Bucket 1: Recent (1-3 days)
Bucket 2: This week (3-7 days)
Bucket 3: Last week (7-14 days)
Bucket 4: This month (14-30 days)
Bucket 5: Last month (30-60 days)
Bucket 6: Few months (60-90 days)
Bucket 7: Half year (90-180 days)
Bucket 8: This year (180-365 days)
Bucket 9: Long ago (365+ days)

Memory Types

Type 0: Episodic   → Events, conversations ("We discussed X yesterday")
Type 1: Semantic   → Facts, preferences ("User likes Python")
Type 2: Procedural → Instructions ("To deploy, run npm build")
Type 3: Unknown    → Fallback

🎓 Training Details

Aspect	Details
Base Model	answerdotai/ModernBERT-base (110M params)
Training Data	500K synthetic agent memory samples
Memory Distribution	Episodic (40%), Semantic (35%), Procedural (25%)
Loss Function	Multiple Negatives Ranking Loss + Importance MSE
Hard Negatives	7 per sample (5 types: temporal, type confusion, topic drift, etc.)
Batch Size	16-32 per GPU
Hardware	2× NVIDIA RTX 6000 Ada (48GB each)
Training Time	~12 hours
Precision	FP16 Mixed Precision
Final Val Loss	0.877

🏗️ Use Cases

1. AI Agents with Long-Term Memory

# Store memories with metadata
agent.remember(
    text="User is allergic to peanuts",
    memory_type="semantic",
    importance=10,  # Critical medical info!
)

# Later, when discussing food...
relevant_memories = agent.recall("What should I know about the user's diet?")
# Returns: "User is allergic to peanuts" (even if stored months ago)

2. RAG Systems for Conversational AI

# Better retrieval for chatbots
query = "What did we talk about in our last meeting?"
# AgentRank returns recent, relevant conversations
# Generic embedders return random topically-similar docs

3. Personal Knowledge Bases

# User's notes and preferences
memories = [
    "I prefer dark mode in all apps",
    "My morning routine starts at 6 AM",
    "Important: Tax deadline April 15",
]
# AgentRank properly handles time-sensitive queries

🆚 When to Use AgentRank vs Others

Use Case	Best Model
AI agents with memory	✅ AgentRank
Time-sensitive retrieval	✅ AgentRank
Conversational AI	✅ AgentRank
General document search	OpenAI / Cohere
Code search	CodeBERT
Scientific papers	SciBERT

📁 Model Family

Model	Parameters	Speed	Quality	Best For
agentrank-small	33M	⚡⚡⚡ Fast	Good	Real-time agents, edge
agentrank-base	149M	⚡⚡ Medium	Best	Quality-critical apps
agentrank-reranker (coming)	149M	⚡ Slower	Superior	Two-stage retrieval

📚 Citation

@misc{agentrank2024,
  author = {Vrushket More},
  title = {AgentRank: Embedding Models for AI Agent Memory Retrieval},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/vrushket/agentrank-base}
}

🤝 Community & Support

🐛 Issues: GitHub Issues
💬 Discussions: HuggingFace Community
📧 Contact: [email protected]

📄 License

Apache 2.0 - Free for commercial use!

⭐ If AgentRank helps your project, please star the repo!

Built with ❤️ for the AI agent community

Downloads last month: 17

Safetensors

Model size

0.1B params

Tensor type

F32

Evaluation results

MRR
self-reported

0.650
Recall@1
self-reported

0.444
Recall@5
self-reported

0.996
NDCG@10
self-reported

0.679