Model Card for RecNeXt-M5

license arXiv

RecConvA code

Model Details

Model Usage

Image Classification

from urllib.request import urlopen
from PIL import Image
import timm
import torch

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('recnext_m5', pretrained=True, distillation=False)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Converting to Inference Mode

import utils

# Convert training-time model to inference structure, fuse batchnorms
utils.replace_batchnorm(model)

Model Comparison

Classification

We introduce two series of models: the A series uses linear attention and nearest interpolation, while the M series employs convolution and bilinear interpolation for simplicity and broader hardware compatibility (e.g., to address suboptimal nearest interpolation support in some iOS versions).

dist: distillation; base: without distillation (all models are trained over 300 epochs).

model top_1_accuracy params gmacs npu_latency cpu_latency throughput fused_weights training_logs
M0 74.7* | 73.2 2.5M 0.4 1.0ms 189ms 763 dist | base dist | base
M1 79.2* | 78.0 5.2M 0.9 1.4ms 361ms 384 dist | base dist | base
M2 80.3* | 79.2 6.8M 1.2 1.5ms 431ms 325 dist | base dist | base
M3 80.9* | 79.6 8.2M 1.4 1.6ms 482ms 314 dist | base dist | base
M4 82.5* | 81.4 14.1M 2.4 2.4ms 843ms 169 dist | base dist | base
M5 83.3* | 82.9 22.9M 4.7 3.4ms 1487ms 104 dist | base dist | base
A0 75.0* | 73.6 2.8M 0.4 1.4ms 177ms 4902 dist | base dist | base
A1 79.6* | 78.3 5.9M 0.9 1.9ms 334ms 2746 dist | base dist | base
A2 80.8* | 79.6 7.9M 1.2 2.2ms 413ms 2327 dist | base dist | base
A3 81.1* | 80.1 9.0M 1.4 2.4ms 447ms 2206 dist | base dist | base
A4 82.5* | 81.6 15.8M 2.4 3.6ms 764ms 1265 dist | base dist | base
A5 83.5* | 83.1 25.7M 4.7 5.6ms 1376ms 721 dist | base dist | base

Comparison with LSNet

model top_1_accuracy params gmacs npu_latency cpu_latency throughput fused_weights training_logs
T 76.6* | 75.1 12.1M 0.3 1.8ms 109ms 14181 dist | base dist | base
S 79.6* | 78.3 15.8M 0.7 2.0ms 188ms 8234 dist | base dist | base
B 81.4* | 80.3 19.3M 1.1 2.5ms 290ms 4385 dist | base dist | base

The NPU latency is measured on an iPhone 13 with models compiled by Core ML Tools. The CPU latency is accessed on a Quad-core ARM Cortex-A57 processor in ONNX format. And the throughput is tested on an Nvidia RTX3090 with maximum power-of-two batch size that fits in memory.

Citation

@misc{zhao2024recnext,
      title={RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations},
      author={Mingshu Zhao and Yi Luo and Yong Ouyang},
      year={2024},
      eprint={2412.19628},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train suous/recnext_m5.base_300e_in1k

Collection including suous/recnext_m5.base_300e_in1k