FaceGuard – ViT (20 CelebA IDs)

A Vision Transformer (ViT-Base) fine-tuned for identity classification on a 20-identity subset of the CelebA dataset.
This model predicts anonymized celeb_id integers (not celebrity names).
It powers the demo Space: https://huggingface.co/spaces/hudaakram/FaceGuard-demo


Model Details

Model Description

  • Architecture: google/vit-base-patch16-224 (pretrained on ImageNet-1k)
  • Fine-tuned for: 20-class identity classification (CelebA celeb_ids)
  • Input: RGB image (face crop), resized and normalized to 224Γ—224
  • Output: Probability distribution over 20 anonymized IDs
  • Parameters: ~86M

Sources


Uses

Direct Use

  • Research demo for identity classification with anonymized CelebA IDs
  • Educational example of fine-tuning ViT for image classification

Downstream Use

  • As a starting point for transfer learning to other small identity classification tasks
  • As an educational reference for hackathons, workshops, or courses

Out-of-Scope Use

  • ❌ Production face recognition / surveillance
  • ❌ Identifying real celebrity names (dataset only provides integer IDs)
  • ❌ Any high-stakes application involving privacy or personal data

Bias, Risks, and Limitations

  • Bias: CelebA contains celebrity faces, which are not representative of all demographics.
  • Limitations: Trained on only 20 identities (~600 images total) β†’ limited generalization.
  • Privacy: CelebA IDs are anonymized integers, not real names. The model is not capable of returning actual identities.

Recommendation: Use strictly for research/educational purposes.


How to Get Started

Use the code below to get started with the model.

from transformers import ViTForImageClassification, AutoImageProcessor
from PIL import Image
import torch

model_id = "hudaakram/FaceGuard-20ID-ViT"
processor = AutoImageProcessor.from_pretrained(model_id)
model = ViTForImageClassification.from_pretrained(model_id)

img = Image.open("face.jpg").convert("RGB")
inputs = processor(images=img, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)[0]

id2label = {int(k): v for k, v in model.config.id2label.items()}
top5 = probs.topk(5)
for score, idx in zip(top5.values, top5.indices):
print(f"Label {idx.item()} (celeb_id {id2label[idx.item()]}): {score:.3f}")

Training Details

Training Data

  • Dataset: CelebA (top 20 identities by frequency)
  • Splits: Stratified 80% train / 10% validation / 10% test
  • Sizes: Train 501, Val 60, Test 77

Training Procedure

  • Seed: 42
  • Epochs: 4
  • Batch size: 16
  • Learning rate: 5e-5
  • Optimizer: AdamW
  • Weight decay: 0.01
  • Precision: FP16 on GPU (Colab)
  • Head resized: from 1000 classes β†’ 20 classes

Preprocessing

  • Images resized + center-cropped to 224Γ—224
  • Normalized to ImageNet mean/std
  • Labels mapped from CelebA celeb_id β†’ contiguous 0–19

Training Hyperparameters

  • Training regime: fp16 mixed precision on GPU
  • Total epochs: 4 (~3 minutes each on Colab T4)

Speeds, Sizes, Times

  • Checkpoint size: ~343 MB
  • Throughput: ~10 samples/sec (Colab T4)

Evaluation

  • Validation Accuracy: ~0.93
  • Test Accuracy: ~0.83
  • Macro AUC: (see ROC below)

Test Split Summary

Split #Images #Classes Min/Class Median/Class Max/Class
Train 501 20 24 24 28
Val 60 20 3 3 3
Test 77 20 3 4 4

Results

Confusion Matrix (normalized):
Confusion Matrix

ROC Curves (one-vs-rest):
ROC Curves


Environmental Impact

  • Hardware: Google Colab T4 GPU
  • Training time: ~12 minutes total (4 epochs)
  • Carbon emissions: negligible (short fine-tuning run)

Technical Specifications

Model Architecture and Objective

  • Vision Transformer (ViT-Base, patch16, 224Γ—224)
  • Objective: Cross-entropy classification across 20 labels

Compute Infrastructure

  • Hardware: Google Colab T4 GPU
  • Framework: PyTorch + Hugging Face Transformers

Citation

CelebA Dataset:
Z. Liu, P. Luo, X. Wang, and X. Tang. Deep Learning Face Attributes in the Wild. ICCV 2015.

ViT:
A. Dosovitskiy et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021.


Model Card Authors

Hackathon submission by Huda Akram

Contact

Downloads last month
15
Safetensors
Model size
85.8M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using hudaakram/FaceGuard-20ID-ViT 1