Mac Szankin

macsz
Β·

AI & ML interests

LLMs, Autonomous spaceships and GenAI at SiMa.ai. After-hours Projects: Processing Thermal Imagery.

Recent Activity

Organizations

Intel's profile picture Intel Labs's profile picture SiMa.ai's profile picture

macsz's activity

replied to fdaudens's post 7 months ago
reacted to morgan's post with πŸ‘ 7 months ago
view post
Post
1303
Llama 3.1 405B Instruct beats GPT-4o on MixEval-Hard

Just ran MixEval for 405B, Sonnet-3.5 and 4o, with 405B landing right between the other two at 66.19

The GPT-4o result of 64.7 replicated locally but Sonnet-3.5 actually scored 70.25/69.45 in my replications πŸ€” Still well ahead of the other 2 though.

Sammple of 1 of the eval calls here: https://wandb.ai/morgan/MixEval/weave/calls/07b05ae2-2ef5-4525-98a6-c59963b76fe1

Quick auto-logging tracing for openai-compatible clients and many more here: https://wandb.github.io/weave/quickstart/

replied to fdaudens's post 7 months ago
view reply

Have you done any comparison with the same data on previous LLaMA 3 70B?