Bhadresh Savani's picture

Bhadresh Savani

bhadresh-savani

AI & ML interests

NLP, Deep Learning, ML

Recent Activity

updated a Space about 10 hours ago
bhadresh-savani/viz-agent
published a Space about 10 hours ago
bhadresh-savani/viz-agent
liked a model 25 days ago
ByteDance/InfiniteYou
View all activity

Organizations

Flax Community's profile picture ONNXConfig for all's profile picture HugGAN Community's profile picture Keras Dreambooth Event's profile picture Lambda Go Labs's profile picture

bhadresh-savani's activity

upvoted 2 articles about 1 month ago
view article
Article

Hugging Face and JFrog partner to make AI Security more transparent

21
view article
Article

Trace & Evaluate your Agent with Arize Phoenix

37
upvoted an article 2 months ago
view article
Article

How to deploy and fine-tune DeepSeek models on AWS

52
reacted to lin-tan's post with 🔥 5 months ago
view post
Post
1443
Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).

Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)

Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: lt-asset/REPOCOD_Lite

#LLM4code #LLM #CodeGeneration #Security
  • 2 replies
·