You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

bert-large-japanese-v2-finetuned-wrime

This model is based on Tohoku University’s BERT-large Japanese v2 and fine-tuned on the WRIME dataset for emotion intensity estimation using Plutchik’s eight basic emotions: joy, sadness, anticipation, surprise, anger, fear, disgust, and trust.
It outputs either probability distributions or intensity scores depending on how you load it. The model is suitable for research on emotion analysis of Japanese SNS posts, conversation logs, or other short text.


Model Details

  • Architecture: BERT-large Japanese v2 (Whole Word Masking, WordPiece tokenizer).
  • Fine-tuning task: Regression of emotion intensities.
  • Languages: Japanese.
  • Base model license: Apache-2.0 (inherits from tohoku-nlp/bert-large-japanese-v2).
  • Dataset used: WRIME (avg_reader annotations).

Usage

Pipeline example:

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

repo = "MuneK/bert-large-japanese-v2-finetuned-wrime"
labels = ["joy","sadness","anticipation","surprise","anger","fear","disgust","trust"]

tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)

clf = pipeline("text-classification", model=model, tokenizer=tok, return_all_scores=True, top_k=None)
text = "今日は外来で嬉しい報告が多くて、少し肩の力が抜けた。"
scores = clf(text)[0]
print(scores)  # [{'label': 'joy', 'score': 0.42}, ...]

Threshold optimized by ROC for binary classification

joy sadness anticipation surprise anger fear disgust trust
ROC_threshold 0.138 0.123 0.155 0.146 0.111 0.122 0.114 0.095

Comparison: Individual vs Vector-Based Evaluation

Individual Evaluation:

  • Mean Binary Accuracy: 81.3%
  • Mean Binary Precision: 57.6%
  • Mean Binary Recall: 66.5%
  • Mean Binary F1-score: 61.2%

Vector-Based Evaluation:

  • Cosine Similarity: 0.922
  • Vector Correlation: 0.696
  • Direction Accuracy (>0.7): 96.8%

Intended Use and Limitations

Intended use:

  • Academic research on emotion analysis.
  • Exploratory analysis of Japanese SNS posts or conversation logs.
  • Visualizing longitudinal changes in emotional expression.

Limitations:

  • Not intended for clinical diagnosis or decision-making.
  • May perform poorly on slang, sarcasm, dialects, or specialized jargon.
  • Performance depends on WRIME’s label distribution; potential biases may exist.

Ethical Considerations

  • The model estimates likelihood of emotional expressions, not the true internal state of individuals.
  • Predictions should always be reviewed by humans before use in sensitive contexts.
  • Avoid use in high-stakes decision-making (e.g., medical diagnosis, crisis detection) without human oversight.

License

This model is released under the Apache-2.0 license, consistent with the base model.


References

  • Kajiwara, T., et al. WRIME: A New Dataset for Emotional Intensity Estimation of Japanese SNS Posts. NAACL 2021.
  • Tohoku NLP. BERT large Japanese v2. Hugging Face model card.

Citation

If you use this model, please cite:

@software{MuneK_wrime_bert_large_japanese_v2,
  title   = {bert-large-japanese-v2-finetuned-wrime},
  author  = {Kanno, Muneaki},
  year    = {2023},
  url     = {https://huggingface.co/MuneK/bert-large-japanese-v2-finetuned-wrime},
  license = {Apache-2.0}
}
Downloads last month
1
Safetensors
Model size
337M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MuneK/bert-large-japanese-v2-finetuned-wrime

Finetuned
(3)
this model

Dataset used to train MuneK/bert-large-japanese-v2-finetuned-wrime

Evaluation results