McGill NLP Group

university

https://mcgill-nlp.github.io/

McGill_NLP

McGill-NLP

Activity Feed

AI & ML interests

computational linguistics, natural language processing

Recent Activity

arkilpatel submitted a paper about 4 hours ago

Forecasting Downstream Performance of LLMs With Proxy Metrics

xhluca updated a Space 5 days ago

McGill-NLP/agent-reward-bench-leaderboard

JessicaOjo updated a Space 11 days ago

McGill-NLP/AfroBench

View all activity

Papers

Forecasting Downstream Performance of LLMs With Proxy Metrics

Structured Distillation of Web Agent Capabilities Enables Generalization

View all Papers

McGill-NLP 's collections 19

AfriqueLLM

Best open African LLM

AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages

Paper • 2601.06395 • Published Jan 10 • 3
McGill-NLP/AfriqueQwen-14B

Text Generation • 15B • Updated Apr 20 • 1.79k • • 3
McGill-NLP/AfriqueQwen-8B

Text Generation • 8B • Updated Apr 20 • 1.17k • • 2
McGill-NLP/AfriqueQwen3.5-4B-ExtendedCM

Text Generation • 5B • Updated Apr 20 • 58

LatentLens Contextual Embeddings

Pre-computed contextual text embeddings for interpreting LLM/VLM hidden states. Use with: pip install latentlens

McGill-NLP/contextual_embeddings-llama3.1-8b

Updated Feb 19
McGill-NLP/contextual_embeddings-gemma2-9b

Updated Feb 19
McGill-NLP/contextual_embeddings-qwen2.5-7b

Updated Feb 19
McGill-NLP/latentlens-qwen2vl-embeddings

Updated Feb 7

The Markovian Thinker

Reformulating the RL of reasoning LLMs through Markovian Thinking paradigm.

McGill-NLP/delethink-24k-1.5b

2B • Updated Oct 9, 2025 • 11 • 5
McGill-NLP/longcot-24k-1.5b

2B • Updated Oct 9, 2025 • 4 • 2
McGill-NLP/longcot-8k-1.5b

2B • Updated Oct 9, 2025 • 5 • 1
McGill-NLP/delethink-96k-base-1.5b

2B • Updated Oct 3, 2025 • 1

SSA-COMET

McGill-NLP/ssa-comet-qe

Translation • Updated May 23, 2025 • 1
McGill-NLP/ssa-comet-mtl

Translation • Updated Mar 11 • 3
McGill-NLP/ssa-comet-stl

Translation • Updated May 23, 2025 • 1
McGill-NLP/ssa-comet-qe-final

Translation • Updated 26 days ago

AgentRewardBench

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published Apr 11, 2025 • 28
McGill-NLP/agent-reward-bench

Viewer • Updated Apr 21, 2025 • 1.41k • 3.14k • 4
Running

Agents

5

Agent Reward Bench Demo

💻

5

Explore agent trajectories and judgments in web benchmarks
Sleeping

Agents

3

Agent Reward Bench Leaderboard

🥇

3

Leaderboard for AgentRewardBench

SafeArena

McGill-NLP/safearena

Updated Apr 23, 2025 • 296 • 4
SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6, 2025 • 21
Running

Agents

3

Safearena Leaderboard

🏃

3

SafeArena Leaderboard

LLM2Vec

McGill-NLP/LLM2Vec-Meta-Llama-32-3B-Instruct-mntp-supervised

Updated Nov 15, 2025
McGill-NLP/LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-supervised

Sentence Similarity • Updated Oct 8, 2024 • 680 • 5
McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised

Sentence Similarity • Updated Apr 30, 2024 • 184k • 51
McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervised

Sentence Similarity • Updated Apr 11, 2024 • 2.61k • 13

AURORA

Repository: https://github.com/McGill-NLP/AURORA

McGill-NLP/AURORA

Viewer • Updated Jul 25, 2024 • 169k • 161 • 7
McGill-NLP/AURORA

Image-to-Image • Updated Dec 21, 2024 • 7 • 4
McGill-NLP/aurora-bench

Viewer • Updated Jul 9, 2024 • 400 • 24 • 2
Runtime error

Agents

5

AURORA

🌖

5

Statcan Dialogue Dataset & Models

mcgill-nlp.github.io/statcan-dialogue-dataset

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents

Paper • 2304.01412 • Published Apr 3, 2023 • 2
McGill-NLP/statcan-dialogue-dataset

Preview • Updated May 24, 2024 • 19 • 7
McGill-NLP/dpr-statcan-conversation_encoder-title

Feature Extraction • 0.1B • Updated Jul 17, 2023 • 7
McGill-NLP/tapas-statcan-large-conversation_encoder-cell_tokens

Feature Extraction • Updated Apr 5, 2023 • 12

MLQuestions

Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval

Paper • 2104.08801 • Published Apr 18, 2021 • 1
McGill-NLP/mlquestions

Updated Nov 11, 2021 • 284 • 3
McGill-NLP/bart-qg-mlquestions-backtraining

Updated Apr 8, 2022 • 9
McGill-NLP/bart-qg-mlquestions-selftraining

Updated Apr 12, 2022 • 5

A3: Agent-as-Annotators

Models and data from "Structured Distillation of Web Agent Capabilities Enables Generalization" (arXiv:2604.07776)

Structured Distillation of Web Agent Capabilities Enables Generalization

Paper • 2604.07776 • Published Apr 9 • 22
McGill-NLP/A3-Qwen3.5-9B

Image-Text-to-Text • 9B • Updated Apr 16 • 143 • 5
McGill-NLP/A3-Qwen3.5-4B

Image-Text-to-Text • 5B • Updated Apr 16 • 8 • 1
McGill-NLP/A3-Qwen3.5-2B

Image-Text-to-Text • 3B • Updated Apr 16 • 13 • 2

CRAG-MM-Diagnostics

McGill-NLP/crag-mm-diagnostics

Viewer • Updated Feb 5 • 1.15k • 109

INJONGO

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

McGill-NLP/AfroXLMR-large-76L-Injongo-intent

Text Classification • 0.6B • Updated May 25, 2025 • 1
McGill-NLP/AfroXLMR-large-76L-Injongo-slot

Token Classification • 0.6B • Updated May 25, 2025 • 2
McGill-NLP/gemma-2-9b-it-Injongo-intent

Text Generation • 9B • Updated May 26, 2025 • 2
McGill-NLP/gemma-2-9b-it-Injongo-slot

Text Generation • 9B • Updated May 26, 2025 • 2

Unequal unlearning

Datasets used for the OLMo experiments in the "Not All Data are Unlearned Equally" paper https://arxiv.org/abs/2504.05058

McGill-NLP/country_capital_qa

Viewer • Updated Apr 16, 2025 • 1.39k • 63
McGill-NLP/book_author_qa

Viewer • Updated Apr 16, 2025 • 1.48k • 43
McGill-NLP/zsre_qa

Viewer • Updated Apr 16, 2025 • 1.52k • 70

Malicious-IR

McGill-NLP/AdvBench-IR

Viewer • Updated Mar 12, 2025 • 520 • 27 • 4
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

Paper • 2503.08644 • Published Mar 11, 2025 • 16
McGill-NLP/AdvBench-IR-Small-Wiki-100

Viewer • Updated May 29, 2025 • 50.9k • 19

CHASE

Generate challenging synthetic data to evaluate LLMs

McGill-NLP/CHASE-QA

Viewer • Updated Feb 21, 2025 • 671 • 52
McGill-NLP/CHASE-Code

Viewer • Updated Feb 21, 2025 • 500 • 66
McGill-NLP/CHASE-Math

Viewer • Updated Feb 21, 2025 • 500 • 47
How to Get Your LLM to Generate Challenging Problems for Evaluation

Paper • 2502.14678 • Published Feb 20, 2025 • 18

WebLINX

https://mcgill-nlp.github.io/weblinx

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Paper • 2402.05930 • Published Feb 8, 2024 • 39
McGill-NLP/WebLINX

Viewer • Updated Dec 7, 2024 • 79.8k • 1.28k • 65
McGill-NLP/WebLINX-full

Updated Sep 21, 2025 • 27.9k • 7
McGill-NLP/weblinx-browsergym

Updated Dec 7, 2024 • 11.9k • 4

WebLINX Models

https://mcgill-nlp.github.io/weblinx

McGill-NLP/Llama-3-8B-Web

Text Generation • 8B • Updated Apr 26, 2024 • 1.89k • 215
McGill-NLP/MiniLM-L6-dmr

Sentence Similarity • Updated Feb 9, 2024 • 24 • 5
McGill-NLP/bge-small-dmr

Sentence Similarity • Updated Feb 9, 2024 • 9 • 1
McGill-NLP/gte-base-dmr

Sentence Similarity • Updated Feb 9, 2024 • 52 • 2

FaithDial

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

Paper • 2204.10757 • Published Apr 22, 2022 • 2
McGill-NLP/FaithDial

Viewer • Updated Feb 5, 2023 • 32.3k • 707 • 18
McGill-NLP/roberta-large-faithcritic

Text Classification • Updated Jul 31, 2022 • 55 • 1

AfriqueLLM

Best open African LLM

AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages

Paper • 2601.06395 • Published Jan 10 • 3
McGill-NLP/AfriqueQwen-14B

Text Generation • 15B • Updated Apr 20 • 1.79k • • 3
McGill-NLP/AfriqueQwen-8B

Text Generation • 8B • Updated Apr 20 • 1.17k • • 2
McGill-NLP/AfriqueQwen3.5-4B-ExtendedCM

Text Generation • 5B • Updated Apr 20 • 58

A3: Agent-as-Annotators

Models and data from "Structured Distillation of Web Agent Capabilities Enables Generalization" (arXiv:2604.07776)

Structured Distillation of Web Agent Capabilities Enables Generalization

Paper • 2604.07776 • Published Apr 9 • 22
McGill-NLP/A3-Qwen3.5-9B

Image-Text-to-Text • 9B • Updated Apr 16 • 143 • 5
McGill-NLP/A3-Qwen3.5-4B

Image-Text-to-Text • 5B • Updated Apr 16 • 8 • 1
McGill-NLP/A3-Qwen3.5-2B

Image-Text-to-Text • 3B • Updated Apr 16 • 13 • 2

LatentLens Contextual Embeddings

Pre-computed contextual text embeddings for interpreting LLM/VLM hidden states. Use with: pip install latentlens

McGill-NLP/contextual_embeddings-llama3.1-8b

Updated Feb 19
McGill-NLP/contextual_embeddings-gemma2-9b

Updated Feb 19
McGill-NLP/contextual_embeddings-qwen2.5-7b

Updated Feb 19
McGill-NLP/latentlens-qwen2vl-embeddings

Updated Feb 7

CRAG-MM-Diagnostics

McGill-NLP/crag-mm-diagnostics

Viewer • Updated Feb 5 • 1.15k • 109

The Markovian Thinker

Reformulating the RL of reasoning LLMs through Markovian Thinking paradigm.

McGill-NLP/delethink-24k-1.5b

2B • Updated Oct 9, 2025 • 11 • 5
McGill-NLP/longcot-24k-1.5b

2B • Updated Oct 9, 2025 • 4 • 2
McGill-NLP/longcot-8k-1.5b

2B • Updated Oct 9, 2025 • 5 • 1
McGill-NLP/delethink-96k-base-1.5b

2B • Updated Oct 3, 2025 • 1

INJONGO

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

McGill-NLP/AfroXLMR-large-76L-Injongo-intent

Text Classification • 0.6B • Updated May 25, 2025 • 1
McGill-NLP/AfroXLMR-large-76L-Injongo-slot

Token Classification • 0.6B • Updated May 25, 2025 • 2
McGill-NLP/gemma-2-9b-it-Injongo-intent

Text Generation • 9B • Updated May 26, 2025 • 2
McGill-NLP/gemma-2-9b-it-Injongo-slot

Text Generation • 9B • Updated May 26, 2025 • 2

SSA-COMET

McGill-NLP/ssa-comet-qe

Translation • Updated May 23, 2025 • 1
McGill-NLP/ssa-comet-mtl

Translation • Updated Mar 11 • 3
McGill-NLP/ssa-comet-stl

Translation • Updated May 23, 2025 • 1
McGill-NLP/ssa-comet-qe-final

Translation • Updated 26 days ago

Unequal unlearning

Datasets used for the OLMo experiments in the "Not All Data are Unlearned Equally" paper https://arxiv.org/abs/2504.05058

McGill-NLP/country_capital_qa

Viewer • Updated Apr 16, 2025 • 1.39k • 63
McGill-NLP/book_author_qa

Viewer • Updated Apr 16, 2025 • 1.48k • 43
McGill-NLP/zsre_qa

Viewer • Updated Apr 16, 2025 • 1.52k • 70

AgentRewardBench

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published Apr 11, 2025 • 28
McGill-NLP/agent-reward-bench

Viewer • Updated Apr 21, 2025 • 1.41k • 3.14k • 4
Running

Agents

5

Agent Reward Bench Demo

💻

5

Explore agent trajectories and judgments in web benchmarks
Sleeping

Agents

3

Agent Reward Bench Leaderboard

🥇

3

Leaderboard for AgentRewardBench

Malicious-IR

McGill-NLP/AdvBench-IR

Viewer • Updated Mar 12, 2025 • 520 • 27 • 4
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

Paper • 2503.08644 • Published Mar 11, 2025 • 16
McGill-NLP/AdvBench-IR-Small-Wiki-100

Viewer • Updated May 29, 2025 • 50.9k • 19

SafeArena

McGill-NLP/safearena

Updated Apr 23, 2025 • 296 • 4
SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6, 2025 • 21
Running

Agents

3

Safearena Leaderboard

🏃

3

SafeArena Leaderboard

CHASE

Generate challenging synthetic data to evaluate LLMs

McGill-NLP/CHASE-QA

Viewer • Updated Feb 21, 2025 • 671 • 52
McGill-NLP/CHASE-Code

Viewer • Updated Feb 21, 2025 • 500 • 66
McGill-NLP/CHASE-Math

Viewer • Updated Feb 21, 2025 • 500 • 47
How to Get Your LLM to Generate Challenging Problems for Evaluation

Paper • 2502.14678 • Published Feb 20, 2025 • 18

LLM2Vec

McGill-NLP/LLM2Vec-Meta-Llama-32-3B-Instruct-mntp-supervised

Updated Nov 15, 2025
McGill-NLP/LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-supervised

Sentence Similarity • Updated Oct 8, 2024 • 680 • 5
McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised

Sentence Similarity • Updated Apr 30, 2024 • 184k • 51
McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervised

Sentence Similarity • Updated Apr 11, 2024 • 2.61k • 13

WebLINX

https://mcgill-nlp.github.io/weblinx

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Paper • 2402.05930 • Published Feb 8, 2024 • 39
McGill-NLP/WebLINX

Viewer • Updated Dec 7, 2024 • 79.8k • 1.28k • 65
McGill-NLP/WebLINX-full

Updated Sep 21, 2025 • 27.9k • 7
McGill-NLP/weblinx-browsergym

Updated Dec 7, 2024 • 11.9k • 4

AURORA

Repository: https://github.com/McGill-NLP/AURORA

McGill-NLP/AURORA

Viewer • Updated Jul 25, 2024 • 169k • 161 • 7
McGill-NLP/AURORA

Image-to-Image • Updated Dec 21, 2024 • 7 • 4
McGill-NLP/aurora-bench

Viewer • Updated Jul 9, 2024 • 400 • 24 • 2
Runtime error

Agents

5

AURORA

🌖

5

WebLINX Models

https://mcgill-nlp.github.io/weblinx

McGill-NLP/Llama-3-8B-Web

Text Generation • 8B • Updated Apr 26, 2024 • 1.89k • 215
McGill-NLP/MiniLM-L6-dmr

Sentence Similarity • Updated Feb 9, 2024 • 24 • 5
McGill-NLP/bge-small-dmr

Sentence Similarity • Updated Feb 9, 2024 • 9 • 1
McGill-NLP/gte-base-dmr

Sentence Similarity • Updated Feb 9, 2024 • 52 • 2

Statcan Dialogue Dataset & Models

mcgill-nlp.github.io/statcan-dialogue-dataset

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents

Paper • 2304.01412 • Published Apr 3, 2023 • 2
McGill-NLP/statcan-dialogue-dataset

Preview • Updated May 24, 2024 • 19 • 7
McGill-NLP/dpr-statcan-conversation_encoder-title

Feature Extraction • 0.1B • Updated Jul 17, 2023 • 7
McGill-NLP/tapas-statcan-large-conversation_encoder-cell_tokens

Feature Extraction • Updated Apr 5, 2023 • 12

FaithDial

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

Paper • 2204.10757 • Published Apr 22, 2022 • 2
McGill-NLP/FaithDial

Viewer • Updated Feb 5, 2023 • 32.3k • 707 • 18
McGill-NLP/roberta-large-faithcritic

Text Classification • Updated Jul 31, 2022 • 55 • 1

MLQuestions

Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval

Paper • 2104.08801 • Published Apr 18, 2021 • 1
McGill-NLP/mlquestions

Updated Nov 11, 2021 • 284 • 3
McGill-NLP/bart-qg-mlquestions-backtraining

Updated Apr 8, 2022 • 9
McGill-NLP/bart-qg-mlquestions-selftraining

Updated Apr 12, 2022 • 5

AI & ML interests

Recent Activity

Papers

Team members 48

McGill-NLP 's collections 19

Agent Reward Bench Demo

Agent Reward Bench Leaderboard

Safearena Leaderboard

AURORA

Agent Reward Bench Demo

Agent Reward Bench Leaderboard

Safearena Leaderboard

AURORA