whisper-large-v3-emotion-classifier-dusha
This model is a fine-tuned version of openai/whisper-large-v3-turbo on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5094
- Accuracy: 0.8053
- Balanced Accuracy: 0.8335
- Precision: 0.8325
- Recall: 0.8335
- F1: 0.8323
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 12
- eval_batch_size: 12
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | Balanced Accuracy | Precision | Recall | F1 |
---|---|---|---|---|---|---|---|---|
0.5125 | 1.0 | 3073 | 0.5822 | 0.7758 | 0.8099 | 0.8033 | 0.8099 | 0.8049 |
0.5102 | 2.0 | 6146 | 0.5777 | 0.7803 | 0.8099 | 0.8278 | 0.8099 | 0.8126 |
0.5219 | 3.0 | 9219 | 0.5094 | 0.8053 | 0.8335 | 0.8325 | 0.8335 | 0.8323 |
Framework versions
- Transformers 4.52.4
- Pytorch 2.7.1+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
Usage
from transformers.modeling_outputs import SequenceClassifierOutput
from transformers import AutoConfig, PreTrainedModel, WhisperProcessor, WhisperModel
import torch.nn as nn
class WhisperClassifier(nn.Module):
def __init__(self, hidden_size, num_labels=5, dropout=0.2):
super().__init__()
self.pool_norm = nn.LayerNorm(hidden_size)
self.pre_dropout = nn.Dropout(dropout)
mid1 = max(hidden_size // 2, num_labels * 4)
mid2 = max(hidden_size // 4, num_labels * 2)
self.classifier = nn.Sequential(
nn.Linear(hidden_size, mid1),
nn.GELU(),
nn.Dropout(dropout),
nn.LayerNorm(mid1),
nn.Linear(mid1, mid2),
nn.GELU(),
nn.Dropout(dropout),
nn.LayerNorm(mid2),
nn.Linear(mid2, num_labels),
)
def forward(self, hidden_states, attention_mask=None):
if attention_mask is not None:
lengths = attention_mask.sum(dim=1, keepdim=True)
masked = hidden_states * attention_mask.unsqueeze(-1)
pooled = masked.sum(dim=1) / lengths
else:
pooled = hidden_states.mean(dim=1)
x = self.pool_norm(pooled)
x = self.pre_dropout(x)
logits = self.classifier(x)
return logits
class WhisperForEmotionClassification(PreTrainedModel):
config_class = AutoConfig
def __init__(
self, config, model_name, num_labels=5, dropout=0.2
):
super().__init__(config)
self.encoder = WhisperModel.from_pretrained(model_name).encoder
hidden_size = config.hidden_size
self.classifier = WhisperClassifier(
hidden_size, num_labels=num_labels, dropout=dropout
)
self.post_init()
def forward(self, input_features, attention_mask=None, labels=None):
encoder_output = self.encoder(
input_features=input_features,
attention_mask=attention_mask,
return_dict=True,
)
hidden_states = encoder_output.last_hidden_state
logits = self.classifier(hidden_states, attention_mask=attention_mask)
loss = None
if labels is not None:
loss = nn.CrossEntropyLoss()(
logits.view(-1, logits.size(-1)), labels.view(-1)
)
return SequenceClassifierOutput(
loss=loss,
logits=logits,
)
EMOTION_LABELS = ['neutral', 'angry', 'positive', 'sad', 'other']
model_name = "nixiieee/whisper-large-v3-emotion-classifier-dusha"
processor = WhisperProcessor.from_pretrained("openai/whisper-large-v3-turbo")
model = WhisperForEmotionClassification.from_pretrained(pretrained_model_name_or_path=model_name, model_name=model_name, num_labels=5, dropout=0.05)
model.eval()
# load audio
wav, sr = torchaudio.load("audio.wav")
# resample if necessary
wav = torchaudio.functional.resample(wav, sr, 16000)
input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
pred_ids = model.generate(**input_features)
pred = pred_ids.logits.argmax(dim=-1).item()
print("Predicted emotion:", EMOTION_LABELS[pred])
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for nixiieee/whisper-large-v3-emotion-classifier-dusha
Base model
openai/whisper-large-v3
Finetuned
openai/whisper-large-v3-turbo