DeepSeek-R1 WebGPU
Next-generation reasoning model that runs locally in-browser
A leaderboard would sound awesome, specially to compare the βusabilityβ (from a aligned perspective).
There is one board, regarding this topic: https://huggingface.co/spaces/AI-Secure/llm-trustworthy-leaderboard
But I think the points you raise are interesting in their own right.
Also subjective work could be standardized to some extent :)
How did you obtain those scores?
Also what does the values mean?
Maybe I'm missing a point, than please advise me I would love to know! :)
But otherwise I can't think of what "health -3" mean and how it compares to "health +15"
(I really don't want to be rude so sorry if it sounds like this! :) )