Fine-tuning parameters
I read your technical report (https://drive.google.com/file/d/1QW9D6TN4KXX6poa6Q5L6FVgqaDQ4DxY9/view) – great work!
Could you share some details of your fine-tuning process:
- Which tweets exactly of the two datasets did you use (there are more than the 11707 samples you used)
- How did you preprocess the tweets?
- What were your hyperparameters?
If you could share your Jypiter notebook that would be awesome!
Thank you, Damian
Hi Damian,
Thanks so much for reaching out!
- For training the model, I used the labelled tweets from the ANTiVax dataset [1], located here: https://github.com/sakibsh/ANTiVax/tree/main/Labeled. Each tweet is labelled as 1 (is misinformation) or 0 (is not misinformation) in the
VaxMisinfoData.csv
file. The Twitter IDs of each tweet are listed in theids.txt
file. In order to access information about each tweet from its tweet ID, including the content of the tweet, I used the Hydrator app (https://github.com/DocNow/hydrator), which extracts the information about each of the tweet IDs automatically, and then saves it into a .csv file. For this model training, I don't believe that I used the CoAID dataset [2]; sorry for the confusion. - For preprocessing the data, I followed most of the steps in the Fine-Tuning a Pretrained Model tutorial (https://huggingface.co/learn/nlp-course/chapter3/1?fw=pt) on Hugging Face. I also found the Text Classification tutorial to be very useful: https://huggingface.co/docs/transformers/tasks/sequence_classification.
- From my knowledge, I believe that I used the default hyperparameters, as I did not do any tuning. However, that would be beneficial to the model to try out different hyperparameters!
Please let me know if you have more questions - I would be happy to help!
Thanks,
Spencer
[1] K. Hayawi, S. Shahriar, M. Serhani, I. Taleb, and S. Mathew, “Anti-vax: a novel twitter dataset for covid-19 vaccine misinformation detection,” Public Health, vol. 203, pp. 23–30, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0033350621004534
[2] L. Cui and D. Lee, “Coaid: Covid-19 healthcare misinformation dataset,” 2020.
Hey Spencer, thank you so much! I will check it and eventually, will come back to you! Thanks