AI & ML interests

Privacy and artificial intelligence. NER. Token Classification.

Recent Activity

MikeDoes 
posted an update 3 days ago
view post
Post
1904
🛡️ At Ai4Privacy, our goal is to empower researchers to build a safer AI ecosystem. Today, we're highlighting crucial research that does just that by exposing a new vulnerability.

The paper "Forget to Flourish" details a new model poisoning technique. It's a reminder that as we fine-tune LLMs, our anonymization and privacy strategies must evolve to counter increasingly sophisticated threats.

We're proud that the Ai4Privacy dataset was instrumental in this study. It served two key purposes:

Provided a Realistic Testbed: It gave the researchers access to a diverse set of synthetic and realistic PII samples in a safe, controlled environment.

Enabled Impactful Benchmarking: It allowed them to measure the actual effectiveness of their data extraction attack, proving it could compromise specific, high-value information.

This work reinforces our belief that progress in AI security is a community effort. By providing robust tools for benchmarking, we can collectively identify weaknesses and build stronger, more resilient systems. A huge congratulations to the authors on this important contribution.

🔗 Read the full paper: https://arxiv.org/html/2408.17354v1

#OpenSource #DataPrivacy #LLM #Anonymization #AIsecurity #HuggingFace #Ai4Privacy #World's largest open privacy masking dataset
MikeDoes 
posted an update 13 days ago
view post
Post
1124
In data privacy, 92% accuracy is not an A-grade. Privacy AI needs to be better.

That's the stark takeaway from a recent benchmark by Diego Mouriño

(Making Science), who put today's top PII detection methods to the test on call center transcripts using the Ai4Privacy dataset.

They pitted cutting-edge LLMs (like GPT-4 & Gemini) against traditional systems (like Cloud DLPs). The results show that our trust in these tools might be misplaced.



📊 The Hard Numbers:



Even top-tier LLMs peaked at a reported 92% accuracy, leaving a potential dangerous 8% gap where your customer's data can leak. They particularly struggled with basics like 'last names' and 'street addresses'.



The old guard? Traditional rule-based systems reportedly achieved a shocking 50% accuracy. A coin toss with your customers' privacy.


This tells us that for privacy tasks, off-the-shelf accuracy is a vanity metric. The real metric is the cost of a single failure—one leaked name, one exposed address.



While no tool is perfect, some are better than others. Diego’s full analysis breaks down which models offer the best cost-to-accuracy balance in this flawed landscape. It's a must-read for anyone serious about building trustworthy AI.

#DataPrivacy #AI #LLM #RiskManagement #MetricsThatMatter #InfoSec

Find the full post here:
https://www.makingscience.com/blog/protecting-customer-privacy-how-to-remove-pii-from-call-center-transcripts/

Dataset:
ai4privacy/pii-masking-400k
MikeDoes 
posted an update about 2 months ago
view post
Post
2707
Started aistatuscodes as a new project to create codes to understand AI performance better.

Going to be posting daily here and on instagram until we get to 100m downloads :)
https://www.instagram.com/MikeDoesDo/

Follow along the journey!
MikeDoes 
posted an update 3 months ago
view post
Post
1545
PII-Masking-1M Final Day (7/7)! 🚀 Today, we unveil 5 NEW Enterprise PII (E-PII) Dataset PREVIEWS!

Standard PII tools often miss sensitive *business* data. That's why we built E-PII previews for the data that powers your operations and compliance needs.

Get a first look (representing 100,000 samples each!) into datasets designed for real-world enterprise security across these categories:

🏥 **PHI Preview**: For Healthcare Data
💳 **PFI Preview:** For Financial Data
🏢 **PWI Preview:** For Workplace Data
💻 **PDI Preview:** For Digital Activity Data
📍 **PLI Preview:** For Location Data


That wraps up our #PIIMasking1M 7 days announcement! HUGE thanks for following along and for your engagement.
Explore ALL our releases, including these E-PII previews, in the Ai4Privacy Hugging Face Collection & show some love ❤️ if you find them useful!
🔗 Visit the Collection:https://huggingface.co/ai4privacy

Let's keep building safer AI, together!