MATTHEW EYESAN

Mackin7

AI & ML interests

Visions models Language Models ML Algorithms and Solutions Security and Data frameworks

Recent Activity

Organizations

None yet

Mackin7's activity

New activity in Mackin7/my-distiset-5d7059ee 2 months ago
New activity in Mackin7/my-distiset-55a6b53b 4 months ago
reacted to fdaudens's post with ๐Ÿ”ฅ 7 months ago
reacted to clem's post with ๐Ÿ”ฅ 8 months ago
view post
Post
4138
Just crossed 200,000 free public AI datasets shared by the community on Hugging Face! Text, image, video, audio, time-series & many more... Thanks everyone!

http://hf.co/datasets
reacted to codelion's post with โค๏ธ 8 months ago
view post
Post
2227
We recently worked with OpenAI to fine-tune gpt-4o and built the SOTA model for the patched-codes/static-analysis-eval benchmark. All the code and data patched-codes/synth-vuln-fixes on how we did it is available on their GitHub - https://github.com/openai/build-hours/tree/main/5-4o_fine_tuning.

Here are some tips based on our experience:

โ†’ Establish baseline with "conditioning" / prompting

โ†’ Task-specific datasets are ideal for PEFT; hard to beat gpt-4o on "broad" tasks

โ†’ Add your best system prompt to each example

โ†’ Ensure training data distribution is similar to inference data

โ†’ Shorten instructions with concise prompts; may require more examples.

โ†’ Define clear evaluation metrics (seriously, please eval!)

You can see more details on the benchmark and process here - https://www.patched.codes/blog/the-static-analysis-evaluation-benchmark-measuring-llm-performance-in-fixing-software-vulnerabilities