Nick Doiron's picture

Nick Doiron

monsoon-nlp

·

https://mapmeld.com/plant-based-llms/

AI & ML interests

biology and multilingual models

Recent Activity

upvoted a collection 24 days ago

reacted to meg's post with 👍 27 days ago

New work from my socially-minded colleagues at Hugging Face, creating some foundations for AI companionship behavior evaluation. Evaluation Dataset: https://huggingface.co/datasets/AI-companionship/INTIMA Paper: https://huggingface.co/datasets/AI-companionship/INTIMA/blob/main/Companionship_Benchmark.pdf Work from @giadap , @frimelle , @yjernite .

liked a dataset 27 days ago

AI-companionship/INTIMA

View all activity

Organizations

upvoted a collection 24 days ago

Deep Ignorance

This collection contains the model and data artifacts from O'Brien et al. (2025). https://deepignorance.ai • 32 items • Updated 27 days ago • 6

upvoted a collection about 1 month ago

Dayhoff Atlas

The models and datasets that comprise the Dayhoff Atlas • 10 items • Updated Jul 28 • 8

upvoted a paper about 2 months ago

TabArena: A Living Benchmark for Machine Learning on Tabular Data

Paper • 2506.16791 • Published Jun 20 • 3

upvoted an article 3 months ago

Article

Accelerating AI for Drug Discovery: Ginkgo’s GDPx Functional Genomics and GDPa Antibody Developability Dataset Series

By

and 1 other •

Jun 24

• 16

upvoted 2 papers 3 months ago

Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA

Paper • 2505.21115 • Published May 27 • 139

Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 123

upvoted a paper 5 months ago

Pretraining Language Models for Diachronic Linguistic Change Discovery

Paper • 2504.05523 • Published Apr 7 • 6

upvoted a collection 5 months ago

blt

4 items • Updated Apr 17 • 27

upvoted an article 5 months ago

Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

By

and 6 others •

Apr 5

• 146

upvoted a collection 5 months ago

Llama 4

Llama 4 release • 13 items • Updated Apr 29 • 617

upvoted a paper 5 months ago

Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages

Paper • 2503.20212 • Published Mar 26 • 7

upvoted 2 papers 6 months ago

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published Mar 12 • 74

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published Mar 10 • 100

upvoted a collection 6 months ago

BD3-LMs

https://m-arriola.com/bd3lms/ • 4 items • Updated 6 days ago • 25

upvoted 2 papers 6 months ago

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Paper • 2502.17424 • Published Feb 24 • 4

NeoBERT: A Next-Generation BERT

Paper • 2502.19587 • Published Feb 26 • 40

upvoted 3 papers 7 months ago

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 101

SEA-HELM: Southeast Asian Holistic Evaluation of Language Models

Paper • 2502.14301 • Published Feb 20 • 2

NoLiMa: Long-Context Evaluation Beyond Literal Matching

Paper • 2502.05167 • Published Feb 7 • 15

upvoted a paper 8 months ago

Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 26