--- license: apache-2.0 base_model: - bhadresh-savani/distilbert-base-uncased-emotion --- # bhadresh-ft-enc Fine-tuned version of [`bhadresh-savani/distilbert-base-uncased-emotion`](https://huggingface.co/bhadresh-savani/distilbert-base-uncased-emotion) on a mix of clean and imperceptibly perturbed emotion classification data. This model is designed to improve robustness against character-level adversarial attacks while retaining high accuracy on clean text. # Modified encoder architecture 6 new Transformer layers were added, bringing the total to 12. The final layer of hidden token embeddings are re-aggregated into "word" embeddings by using the groupings created by the tokenizer. The [CLS] embedding is also passed into the new Transformer block. The final output [CLS] embedding is used for classification. A contrastive loss (cosine similarity) between the final [CLS] embeddings generated by clean and perturbed inputs is also added during training. Accuracy on [`vlwk/emotion-perturbed`](https://huggingface.co/datasets/vlwk/emotion-perturbed) improves by 2% on perturbation budget 1 to 5 and over 10% over the original model. ## Model Description - **Base model**: `distilbert-base-uncased-emotion` - **Fine-tuning data**: [`vlwk/emotion-perturbed`](https://huggingface.co/datasets/vlwk/emotion-perturbed): lean and perturbed emotion classification inputs (perturbation types: homoglyphs, deletions, reorderings, invisible characters), perturbation budget 1 to 5. - **Training epochs**: 3 - **Batch size**: 16 - **Learning rate**: 2e-5 - **Validation split**: 10% ## Intended Use This model is intended for robust emotion classification under adversarial character-level noise. It is particularly useful for evaluating or defending against imperceptible text perturbations. ## Usage ```python from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification tokenizer = DistilBertTokenizerFast.from_pretrained("vlwk/bhadresh-ft-enc") model = DistilBertForSequenceClassification.from_pretrained("vlwk/bhadresh-ft-enc") inputs = tokenizer("I'm feeling great today!", return_tensors="pt") outputs = model(**inputs) predicted_class = outputs.logits.argmax(-1).item()