Sarthak Malhotra

zarmalhotra

AI & ML interests

None yet

Recent Activity

liked a dataset about 21 hours ago
rekrek/reasoning-engaging-story
liked a dataset 2 days ago
codelion/math500-cot-experiment
liked a dataset 2 days ago
DataTonic/dark_thoughts_case_study_reason
View all activity

Organizations

Bespoke Labs's profile picture Reasoning datasets competition 's profile picture

zarmalhotra's activity

reacted to ZennyKenny's post with 🔥 6 days ago
view post
Post
3323
When I heard the Reasoning Dataset Competition deadline was extended to 9 May, I knew I had time to get in one more entry. 🔥🔥🔥

With the rise of Vibe Coding, and the potential risks that are introduced by humans letting LLMs build their apps for them, lots of people are (rightfully) concerned about the safety of the code that is hitting prod.

In response to that, I'm happy to present my final submission to the Reasoning Dataset Competition and attempt to start benchmarking the ability of LLMs to identify unsafe and / or exploitable code by way of the CoSa (Code Safety) benchmark: ZennyKenny/cosa-benchmark-dataset

Currently a curated set of 200 examples, calibrated on OpenAI's standard issue models (GPT-4.1, o4 mini, and GPT-3.5 Turbo) as "baseline performance" (70% decile). Check it out and drop a ❤️ if you think it could be useful or hit the Community section with suggestions / critiques.
  • 2 replies
·
New activity in reasoning-datasets-competition/README 10 days ago

Competition Lobby

2
10
#1 opened about 1 month ago by
ZennyKenny
updated a Space 10 days ago