bert-large-japanese-v2-finetuned-wrime
This model is based on Tohoku University’s BERT-large Japanese v2 and fine-tuned on the WRIME dataset for emotion intensity estimation using Plutchik’s eight basic emotions: joy, sadness, anticipation, surprise, anger, fear, disgust, and trust.
It outputs either probability distributions or intensity scores depending on how you load it. The model is suitable for research on emotion analysis of Japanese SNS posts, conversation logs, or other short text.
Model Details
- Architecture: BERT-large Japanese v2 (Whole Word Masking, WordPiece tokenizer).
- Fine-tuning task: Regression of emotion intensities.
- Languages: Japanese.
- Base model license: Apache-2.0 (inherits from
tohoku-nlp/bert-large-japanese-v2
). - Dataset used: WRIME (avg_reader annotations).
Usage
Pipeline example:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
repo = "MuneK/bert-large-japanese-v2-finetuned-wrime"
labels = ["joy","sadness","anticipation","surprise","anger","fear","disgust","trust"]
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)
clf = pipeline("text-classification", model=model, tokenizer=tok, return_all_scores=True, top_k=None)
text = "今日は外来で嬉しい報告が多くて、少し肩の力が抜けた。"
scores = clf(text)[0]
print(scores) # [{'label': 'joy', 'score': 0.42}, ...]
Threshold optimized by ROC for binary classification
joy | sadness | anticipation | surprise | anger | fear | disgust | trust | |
---|---|---|---|---|---|---|---|---|
ROC_threshold | 0.138 | 0.123 | 0.155 | 0.146 | 0.111 | 0.122 | 0.114 | 0.095 |
Comparison: Individual vs Vector-Based Evaluation
Individual Evaluation:
- Mean Binary Accuracy: 81.3%
- Mean Binary Precision: 57.6%
- Mean Binary Recall: 66.5%
- Mean Binary F1-score: 61.2%
Vector-Based Evaluation:
- Cosine Similarity: 0.922
- Vector Correlation: 0.696
- Direction Accuracy (>0.7): 96.8%
Intended Use and Limitations
Intended use:
- Academic research on emotion analysis.
- Exploratory analysis of Japanese SNS posts or conversation logs.
- Visualizing longitudinal changes in emotional expression.
Limitations:
- Not intended for clinical diagnosis or decision-making.
- May perform poorly on slang, sarcasm, dialects, or specialized jargon.
- Performance depends on WRIME’s label distribution; potential biases may exist.
Ethical Considerations
- The model estimates likelihood of emotional expressions, not the true internal state of individuals.
- Predictions should always be reviewed by humans before use in sensitive contexts.
- Avoid use in high-stakes decision-making (e.g., medical diagnosis, crisis detection) without human oversight.
License
This model is released under the Apache-2.0 license, consistent with the base model.
References
- Kajiwara, T., et al. WRIME: A New Dataset for Emotional Intensity Estimation of Japanese SNS Posts. NAACL 2021.
- Tohoku NLP. BERT large Japanese v2. Hugging Face model card.
Citation
If you use this model, please cite:
@software{MuneK_wrime_bert_large_japanese_v2,
title = {bert-large-japanese-v2-finetuned-wrime},
author = {Kanno, Muneaki},
year = {2023},
url = {https://huggingface.co/MuneK/bert-large-japanese-v2-finetuned-wrime},
license = {Apache-2.0}
}
- Downloads last month
- 1
Model tree for MuneK/bert-large-japanese-v2-finetuned-wrime
Base model
tohoku-nlp/bert-large-japanese-v2Dataset used to train MuneK/bert-large-japanese-v2-finetuned-wrime
Evaluation results
- accuracy on WRIME (reader labels)test set self-reported0.813
- f1 on WRIME (reader labels)test set self-reported0.612
- vector_correlation on WRIME (reader labels)test set self-reported0.696
- cosine_similarity on WRIME (reader labels)test set self-reported0.922