MYTHTRIAGE: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform

This repository contains one of eight lightweight models accompanying the EMNLP 2025 paper MYTHTRIAGE: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform. MythTriage is a scalable pipeline for detecting opioid use disorder (OUD) myths on YouTube, enabling large-scale analysis and informing moderation and health interventions.

Overview

MythTriage is designed to automatically evaluate and classify YouTube videos for opioid use disorder myths. The triage pipeline uses lightweight models (fine-tuned DeBERTa-v3-base, as shown in this repository) for routine cases and defers harder ones to state-of-the-art, but costlier large language models (GPT-4o) to provide robust, cost-efficient, and high-performing detection of opioid use disorder myths on YouTube. For more information, please read our paper.

MythTriage detects and classifies 8 categories of prevalent opioid use disorder myths recognized by major health organizations and validated by clinical experts. This repository contains the fine-tuned DeBerta-v3-base model trained to detect one of eight categories of opioid use disorder myths in YouTube videos, namely: M8: Kratom is a non-addictive and safe alternative to opioids.

Model Description & Datasets

Given the YouTube video metadata (e.g., title, description, transcript, tags), the model will predict one of three numeric labels with respect to the myth (e.g., M8: Kratom is a non-addictive and safe alternative to opioids.): opposing the myth (0), neither (1), and supporting the myth (2).

The video dataset used to train and evaluate the model is available at the Github link here. As part of our distillation process, this model was trained on GPT-4o-generated synthetic labels on ~1.4K videos and then evaluated on ~300 gold-standard videos labeled by clinical experts. Additional details are provided in the paper.

How to Get Started with the Model

To get started, you should initialize the model using AutoTokenizer and AutoModelForSequenceClassification classes. For the AutoTokenizer, please use the tokenizer from microsoft/deberta-v3-base and set "use_fast" parameter to False, the max_len to 1024, padding to "max_length," and truncation to True.For the AutoModelForSequenceClassification, set the model to this repository and the "num_labels" parameter to 3.

Next, with a YouTube video dataset with metadata, please concatenate each video's title, description, transcripts, and tags in the following manner:

input = 'VIDEO TITLE: ' + title + '\nVIDEO DESCRIPTION: ' + description + '\nVIDEO TRANSCRIPT: ' + transcript + '\nVIDEO TAGS: ' + tags

Thus, each video in your dataset should have its input metadata formatted in the structure above. Finally, run the input into a tokenizer and feed the tokenized input into the model to obtain one of three predicted labels. Use the logit function to obtain the label:

_, pred_idx = outputs.logits.max(dim=1)

Training Hyperparameters

During training, we conducted a grid search over learning rates (5e-6, 1e-5, 1e-6), weight decays (5e-4, 1e-4, 5e-5), and data balancing strategies (none, upsampling, class-weighted loss). Other hyperparameters include:

OPTIMIZER: Adam optimizer with cross-entropy loss function
BATCH_SIZE = 8
NUM_EPOCHS = 20
MIN_SAVE_EPOCH = 2

The synthetic dataset of 1.4K videos was split 80:20 in training (N=1173) and validation sets (N=293). The 310 gold-standard dataset labeled by clinical experts served as the test set. The model was fine-tuned on a single NVIDIA A40 GPU.

Results

The model achieved a macro F1-score of 0.78 on the gold-standard test set annotated by clinical experts.

Other Models for Opioid Use Disorder Myths Detection

As part of MythTriage, we finetuned eight lightweight models, each trained to detect a specific opioid use disorder myth. Below, we link the detection model corresponding to each myth:

M1: Agonist therapy or medication-assisted treatment (MAT) for OUD is merely replacing one drug with another (LINK)
M2: People with OUD are not suffering from a medical disease treatable with medication from a self-imposed condition maintained through the lack of moral fiber (LINK)
M3: The ultimate goal of treatment for OUD is abstinence from any opioid use (e.g., Taking medication is not true recovery) (LINK)
M4: Only patients with certain characteristics are vulnerable to addiction (LINK)
M5: Physical dependence or tolerance is the same as addiction (LINK)
M6: Detoxification for OUD is effective (LINK)
M7: You should only take medication for a brief period of time (LINK)
M8: Kratom is a non-addictive and safe alternative to opioids (LINK)

Citation

If you used this model or the dataset in the Github in your research, please cite our work at:

@misc{jung2025mythtriagescalabledetectionopioid,
      title={MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform}, 
      author={Hayoung Jung and Shravika Mittal and Ananya Aatreya and Navreet Kaur and Munmun De Choudhury and Tanushree Mitra},
      year={2025},
      eprint={2506.00308},
      archivePrefix={arXiv},
      primaryClass={cs.CY},
      url={https://arxiv.org/abs/2506.00308}, 
}

Downloads last month: 4

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for SocialCompUW/youtube-opioid-myth-detect-M8

Base model

microsoft/deberta-v3-base

Finetuned

(442)

this model