OpenCLIP ViT-L/14 with Test-Time Register
Register tokens in ViTs were introduced as learnable tokens in Vision Transformers Need Registers to mitigate artifacts in intermediate feature maps. In Vision Transformers Don't Need Trained Registers, we introduced a training-free method to create registers. These test-time registers serve a similar purpose as the original trained registers, but can be added post-hoc to any ViT to mitigate artifacts, enhance model interpretability, and modestly improve downstream performance in tasks such as segmentation, depth estimation, etc.
Model description
The base model is OpenCLIP-ViT-L-14-laion2B-s32B-b82K. With test-time registers, the model's internal representations are cleaner (see below). Using the environment from here and evaluating using bfloat16 leads to IN-1k zeroshot performance of 76.4 for both the original model and the variant with test-time registers. This model is intended to be used with this repo. Use transformers==4.45.1. The model can also be used for fine-tuning or other downstream tasks.


Quick Start
from transformers import AutoModel
from PIL import Image
import torch
# Load the complete model with all components
model = AutoModel.from_pretrained(
"amildravid4292/clip-vitl14-test-time-registers",
trust_remote_code=True
)
# Check what was loaded
print(f"Register tokens: {model.num_register_tokens}")
print(f"Neuron dict: {model.neuron_dict}")
print(f"Tokenizer available: {model.tokenizer is not None}")
print(f"Preprocessor available: {model.preprocessor is not None}")
print(f"Zero-shot classifier available: {model.zeroshot_classifier is not None}")
Usage Examples
Image Processing
from PIL import Image
# Load and preprocess image
image = Image.open("your_image.jpg")
image_tensor = model.preprocess_image(image).unsqueeze(0)
image_features = model.encode_image(
image_tensor
)
# to run inference with the original model without test-time registers
image_features = model.encode_image(
image_tensor,
neuron_dict=None,
num_register_tokens=0
)
Text Processing
# Tokenize text
text = ["a photo of a cat", "a photo of a dog"]
text_tokens = model.tokenize(text)
# Encode text
text_features = model.encode_text(text_tokens)
Complete Pipeline
# load model
model = AutoModel.from_pretrained('amildravid4292/clip-vitl14-test-time-registers', trust_remote_code=True)
model = model.to(device).bfloat16()
classifier = model.zeroshot_classifier.to(device).bfloat16()
# load data
imagenet_dataset = ImageNet(root='/datasets/ilsvrc/current', split='val', transform=model.preprocessor)
ground_truth_labels = [imagenet_dataset.targets[i] for i in range(len(imagenet_dataset))]
loader = torch.utils.data.DataLoader(imagenet_dataset, batch_size=100, num_workers=4, pin_memory=True, shuffle=False)
# run zero-shot classification
with torch.no_grad():
correct = [0, 0]
for i, (images, target) in enumerate(tqdm(loader)):
images = images.to(device).bfloat16()
target = target.to(device).bfloat16()
# predict
image_features = model.encode_image(images)
image_features /= image_features.norm(dim=-1, keepdim=True)
logits = 100. * image_features @ classifier
pred = logits.argmax(dim=-1)
correct[0] += (pred == target).sum().item()
correct[1] += target.size(0)
print(correct[0]/correct[1])
Advanced Usage
Custom Neuron Modifications
# Override the saved neuron configuration
custom_neuron_dict = {0: [10, 20, 30]} # Modify neurons 10,20,30 in layer 0
image_features = model.encode_image(
image_tensor,
num_register_tokens=4,
neuron_dict=custom_neuron_dict
)
Different Register Token Counts
# Use different number of register tokens
image_features = model.encode_image(
image_tensor,
num_register_tokens=8 # Override the default
)
Model Details
- Base Architecture: ViT-L/14
- Training Data: LAION-2B subset
BibTeX entry and citation info
@misc{jiang2025visiontransformersdontneed,
title={Vision Transformers Don't Need Trained Registers},
author={Nick Jiang and Amil Dravid and Alexei Efros and Yossi Gandelsman},
year={2025},
eprint={2506.08010},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.08010},
}
- Downloads last month
- 65