PatrickHaller (Patrick Haller)

reacted to victor's post with 🚀 about 1 year ago

Post

4326

The hype is real: a mysterious gpt2-chatbot model has appeared on the LLM Arena Leaderboard 👀.
It seems to be at least on par with the top performing models (closed and open).

To try it out: https://chat.lmsys.org/ -> then click on the Direct Chat tab and select gpt2-chatbot.

Take your bet, what do you think it is?

4 replies

·

posted an update about 1 year ago

Post

2012

How Robust Is Your Model in Complex Code Generation Tasks? 🤔

We've launched the PECC benchmark to challenge chat models in code generation, drawing from the Advent of Code for programming tasks and the Euler Project for math-heavy challenges. This new task tests models with problems presented in both detailed prose and concise "leet code" styles, evaluating their ability to understand and solve complex coding issues and math problem in chat-based interactions.

It seems that the Claude 3 models outperforme ChatGPT:
Model / Avg. (pass@3)
Claude 3 Haiku / 27.67
GPT-3.5-Turbo / 23.75
Mixtral-8x22B-Instruct-v0.1 / 8.35

Read our Preprint📃: PECC: Problem Extraction and Coding Challenges (2404.18766)
Look at the dataset🔎: PatrickHaller/pecc

We also got accepted at LREC-COLING '24 🎉

reacted to m-ric's post with 🔥 over 1 year ago

Post

2466

𝐏𝐚𝐩𝐞𝐫 𝐑𝐞𝐯𝐢𝐞𝐰: 𝐑𝐡𝐨-𝟏 - 𝐃𝐨 𝐧𝐨𝐭 𝐮𝐬𝐞 𝐚𝐥𝐥 𝐭𝐨𝐤𝐞𝐧𝐬 𝐞𝐪𝐮𝐚𝐥𝐥𝐲 𝐢𝐧 𝐲𝐨𝐮𝐫 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠! ⚖️⛔️

A new paper topping Daily papers questions a hidden assumption in LLM training:

🤔 𝙎𝙝𝙤𝙪𝙡𝙙 𝙬𝙚 𝙧𝙚𝙖𝙡𝙡𝙮 𝙪𝙨𝙚 𝙖𝙡𝙡 𝙩𝙤𝙠𝙚𝙣𝙨 𝙚𝙦𝙪𝙖𝙡𝙡𝙮 𝙞𝙣 𝙤𝙪𝙧 𝙇𝙇𝙈'𝙨 𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 ?

Some tokens are more relevant than others, and some are mostly noise (just look up the history of 𝘚𝘰𝘭𝘪𝘥𝘎𝘰𝘭𝘥𝘔𝘢𝘨𝘪𝘬𝘢𝘳𝘱).

So this paper introduces 𝗦𝗲𝗹𝗲𝗰𝘁𝗶𝘃𝗲 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴, which is actually really simple:
➡️ A specific metric measures the relevance of each token. Then during training, only the top k% tokens for this relevance metric count in the loss calculation.

Authors test this method by training models on the difficult MATH dataset (only competition mathematics problems).

➡️ Their technique seems like a new must-do in LLM training: Training is much faster and reaches an impressive performance!

𝐑𝐞𝐬𝐮𝐥𝐭𝐬:
◆ ⏱️ Training is x5 to x10 faster to reach equivalent performance compared to standard language modeling.
◆ 💪 Their 1B model achieves close to GPT4 Chain-of-Thought performance on MATH!
◆ 🚀 Their 7B model match performance of the state-of-the-art DeepSeek for the same size, while trained on only 3% of tokens

𝐀𝐝𝐝𝐢𝐭𝐢𝐨𝐧𝐚𝐥 𝐢𝐧𝐬𝐢𝐠𝐡𝐭𝐬 💡
◆ Datasets used for pre-training, even after pre-filtering, still contain a large proportion of noisy tokens 😖
◆ Authors show that when you reduce loss on noisy tokens, you actually reduce accuracy (Figure 7). So Selective Language Modeling seems fundamental! ✅

Find great reads in @akhaliq 's Daily Papers 👉 https://huggingface.co/papers
Paper added to my collection 👉 m-ric/spinning-up-in-llms-659e698f9dd5a71bd3f579a7

Patrick Haller

AI & ML interests

Recent Activity

Organizations

Patrick Haller

AI & ML interests

Recent Activity

Organizations

PatrickHaller's activity