Kenneth Hamilton's picture

Kenneth Hamilton PRO

ZennyKenny

AI & ML interests

Building and enablement @ montebello.ai Certified vibe coder

Recent Activity

liked a model about 16 hours ago
mlabonne/Qwen3-14B-abliterated
new activity about 19 hours ago
mlabonne/Qwen3-14B-abliterated:interesting
liked a Space about 20 hours ago
onekq-ai/WebApp1K-models-leaderboard
View all activity

Organizations

scikit-learn's profile picture TorchGeo's profile picture Kornia AI's profile picture Blog-explorers's profile picture OpenLLM France's profile picture Team Tonic's profile picture ZeroGPU Explorers's profile picture Data is Better Together - Russian Language Team's profile picture The Nevsky Collective's profile picture Plan Communications's profile picture MLX Community's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Data Is Better Together Contributor's profile picture Reasoning datasets competition 's profile picture

Posts 22

view post
Post
1672
When I heard the Reasoning Dataset Competition deadline was extended to 9 May, I knew I had time to get in one more entry. 🔥🔥🔥

With the rise of Vibe Coding, and the potential risks that are introduced by humans letting LLMs build their apps for them, lots of people are (rightfully) concerned about the safety of the code that is hitting prod.

In response to that, I'm happy to present my final submission to the Reasoning Dataset Competition and attempt to start benchmarking the ability of LLMs to identify unsafe and / or exploitable code by way of the CoSa (Code Safety) benchmark: ZennyKenny/cosa-benchmark-dataset

Currently a curated set of 200 examples, calibrated on OpenAI's standard issue models (GPT-4.1, o4 mini, and GPT-3.5 Turbo) as "baseline performance" (70% decile). Check it out and drop a ❤️ if you think it could be useful or hit the Community section with suggestions / critiques.

Articles 2

Article
1

Open-sourcing the Plain English to SQL Pipeline