🧠 AgentRank-Base

The First Embedding Model Built Specifically for AI Agent Memory Retrieval

MRR Recall@5 Parameters License

+23% MRR improvement over general-purpose embedders | Temporal awareness | Memory type understanding

πŸš€ Quick Start β€’ πŸ“Š Benchmarks β€’ πŸ”§ Architecture β€’ πŸ’‘ Why AgentRank?


🎯 TL;DR

AgentRank-Base is an embedding model designed for AI agents that need to remember. Unlike generic embedders (OpenAI, Cohere, MiniLM), AgentRank understands:

  • ⏰ When something happened (temporal awareness)
  • πŸ“ What type of memory it is (episodic vs semantic vs procedural)
  • ⭐ How important the memory is

πŸ’‘ Why AgentRank?

The Problem with Current Embedders

AI agents need memory. But when you ask an agent:

"What did we discuss about Python yesterday?"

Current embedders fail because they:

  • ❌ Don't understand "yesterday" means recent time
  • ❌ Can't distinguish between a preference and an event
  • ❌ Treat all memories as equally important

The AgentRank Solution

Challenge OpenAI/Cohere/MiniLM AgentRank
"What did I say yesterday?" Random old results πŸ˜• Recent memories first βœ…
"What's my preference?" Mixed with events πŸ˜• Only preferences βœ…
"What's most important?" No priority πŸ˜• Importance-aware retrieval βœ…

πŸ“Š Benchmarks

Evaluated on AgentMemBench (500 test samples, 8 candidates each):

Model Parameters MRR ↑ Recall@1 ↑ Recall@5 ↑ NDCG@10 ↑
AgentRank-Base 149M 0.6496 0.4440 0.9960 0.6786
AgentRank-Small 33M 0.6375 0.4460 0.9740 0.6797
all-mpnet-base-v2 109M 0.5351 0.3660 0.7960 0.6335
all-MiniLM-L6-v2 22M 0.5297 0.3720 0.7520 0.6370

Improvement Over Baselines

vs Baseline MRR Recall@1 Recall@5
vs MiniLM +22.6% +19.4% +32.4%
vs MPNet +21.4% +21.3% +25.1%

πŸš€ Quick Start

Installation

pip install transformers torch

Basic Usage

from transformers import AutoModel, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModel.from_pretrained("vrushket/agentrank-base")
tokenizer = AutoTokenizer.from_pretrained("vrushket/agentrank-base")

def encode(texts, model, tokenizer):
    """Encode texts to embeddings."""
    inputs = tokenizer(
        texts, 
        padding=True, 
        truncation=True, 
        max_length=512,
        return_tensors="pt"
    )
    with torch.no_grad():
        outputs = model(**inputs)
        # Mean pooling
        embeddings = outputs.last_hidden_state.mean(dim=1)
        # L2 normalize
        embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
    return embeddings

# Your agent's memories
memories = [
    "User prefers Python over JavaScript for backend development",
    "User asked about React frameworks yesterday",
    "User mentioned they have 3 years of coding experience",
    "User is working on an e-commerce project",
]

# A query from the user
query = "What programming language does the user prefer?"

# Encode everything
memory_embeddings = encode(memories, model, tokenizer)
query_embedding = encode([query], model, tokenizer)

# Find most similar memory
similarities = torch.mm(query_embedding, memory_embeddings.T)[0]
best_match_idx = similarities.argmax().item()

print(f"Query: {query}")
print(f"Best match: {memories[best_match_idx]}")
print(f"Similarity: {similarities[best_match_idx]:.4f}")

# Output:
# Query: What programming language does the user prefer?
# Best match: User prefers Python over JavaScript for backend development
# Similarity: 0.8234

Advanced Usage with Metadata

For full temporal and memory type awareness, use the AgentRank package:

# Coming soon: pip install agentrank
from agentrank import AgentRankEmbedder

model = AgentRankEmbedder.from_pretrained("vrushket/agentrank-base")

# Encode with temporal context
memory_embedding = model.encode(
    text="User mentioned they prefer morning meetings",
    days_ago=7,           # Memory is 1 week old
    memory_type="semantic" # It's a preference (not an event)
)

# Encode query (no metadata needed for queries)
query_embedding = model.encode("When does the user like to have meetings?")

# The model now knows this is a week-old preference!
similarity = torch.cosine_similarity(query_embedding, memory_embedding, dim=0)

πŸ”§ Architecture

AgentRank-Base is built on ModernBERT-base (110M params) with novel additions:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     ModernBERT Encoder (22 Transformer Layers)  β”‚
β”‚     - RoPE Positional Encoding                  β”‚
β”‚     - Flash Attention                           β”‚
β”‚     - 768 Hidden Dimension                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       ↓               ↓               ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Temporal   β”‚ β”‚  Memory     β”‚ β”‚ Importance  β”‚
β”‚  Position   β”‚ β”‚  Type       β”‚ β”‚ Prediction  β”‚
β”‚  Embeddings β”‚ β”‚  Embeddings β”‚ β”‚ Head        β”‚
β”‚  (10 Γ— 768) β”‚ β”‚  (4 Γ— 768)  β”‚ β”‚ (768β†’1)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚               β”‚               β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       ↓
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚  Projection Layer   β”‚
          β”‚  (768 β†’ 768)        β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       ↓
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚  L2 Normalization   β”‚
          β”‚  768-dim Embedding  β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Novel Components

Component Purpose How It Helps
Temporal Embeddings Encodes memory age (today, this week, last month, etc.) "Yesterday" queries match recent memories
Memory Type Embeddings Distinguishes episodic/semantic/procedural "What do I like?" matches preferences, not events
Importance Head Auxiliary task predicting memory priority Helps learn better representations

Temporal Buckets

Bucket 0: Today (0-1 days)
Bucket 1: Recent (1-3 days)
Bucket 2: This week (3-7 days)
Bucket 3: Last week (7-14 days)
Bucket 4: This month (14-30 days)
Bucket 5: Last month (30-60 days)
Bucket 6: Few months (60-90 days)
Bucket 7: Half year (90-180 days)
Bucket 8: This year (180-365 days)
Bucket 9: Long ago (365+ days)

Memory Types

Type 0: Episodic   β†’ Events, conversations ("We discussed X yesterday")
Type 1: Semantic   β†’ Facts, preferences ("User likes Python")
Type 2: Procedural β†’ Instructions ("To deploy, run npm build")
Type 3: Unknown    β†’ Fallback

πŸŽ“ Training Details

Aspect Details
Base Model answerdotai/ModernBERT-base (110M params)
Training Data 500K synthetic agent memory samples
Memory Distribution Episodic (40%), Semantic (35%), Procedural (25%)
Loss Function Multiple Negatives Ranking Loss + Importance MSE
Hard Negatives 7 per sample (5 types: temporal, type confusion, topic drift, etc.)
Batch Size 16-32 per GPU
Hardware 2Γ— NVIDIA RTX 6000 Ada (48GB each)
Training Time ~12 hours
Precision FP16 Mixed Precision
Final Val Loss 0.877

πŸ—οΈ Use Cases

1. AI Agents with Long-Term Memory

# Store memories with metadata
agent.remember(
    text="User is allergic to peanuts",
    memory_type="semantic",
    importance=10,  # Critical medical info!
)

# Later, when discussing food...
relevant_memories = agent.recall("What should I know about the user's diet?")
# Returns: "User is allergic to peanuts" (even if stored months ago)

2. RAG Systems for Conversational AI

# Better retrieval for chatbots
query = "What did we talk about in our last meeting?"
# AgentRank returns recent, relevant conversations
# Generic embedders return random topically-similar docs

3. Personal Knowledge Bases

# User's notes and preferences
memories = [
    "I prefer dark mode in all apps",
    "My morning routine starts at 6 AM",
    "Important: Tax deadline April 15",
]
# AgentRank properly handles time-sensitive queries

πŸ†š When to Use AgentRank vs Others

Use Case Best Model
AI agents with memory βœ… AgentRank
Time-sensitive retrieval βœ… AgentRank
Conversational AI βœ… AgentRank
General document search OpenAI / Cohere
Code search CodeBERT
Scientific papers SciBERT

πŸ“ Model Family

Model Parameters Speed Quality Best For
agentrank-small 33M ⚑⚑⚑ Fast Good Real-time agents, edge
agentrank-base 149M ⚑⚑ Medium Best Quality-critical apps
agentrank-reranker (coming) 149M ⚑ Slower Superior Two-stage retrieval

πŸ“š Citation

@misc{agentrank2024,
  author = {Vrushket More},
  title = {AgentRank: Embedding Models for AI Agent Memory Retrieval},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/vrushket/agentrank-base}
}

🀝 Community & Support


πŸ“„ License

Apache 2.0 - Free for commercial use!


⭐ If AgentRank helps your project, please star the repo!

Built with ❀️ for the AI agent community

Downloads last month
17
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results