Ai4Privacy

company

https://ai4privacy.com/

ai4Privacy

Activity Feed Request to join this org

AI & ML interests

Privacy and artificial intelligence. NER. Token Classification.

Recent Activity

DeepMount00 new activity about 2 months ago

ai4privacy/llama-ai4privacy-multilingual-categorical-anonymiser-openpii:Cat

MikeDoes new activity about 2 months ago

ai4privacy/llama-ai4privacy-english-anonymiser-openpii:Base model

MikeDoes updated a model about 2 months ago

ai4privacy/llama-ai4privacy-english-anonymiser-openpii

View all activity

MikeDoes

posted an update 3 days ago

Post

1904

🛡️ At Ai4Privacy, our goal is to empower researchers to build a safer AI ecosystem. Today, we're highlighting crucial research that does just that by exposing a new vulnerability.

The paper "Forget to Flourish" details a new model poisoning technique. It's a reminder that as we fine-tune LLMs, our anonymization and privacy strategies must evolve to counter increasingly sophisticated threats.

We're proud that the Ai4Privacy dataset was instrumental in this study. It served two key purposes:

Provided a Realistic Testbed: It gave the researchers access to a diverse set of synthetic and realistic PII samples in a safe, controlled environment.

Enabled Impactful Benchmarking: It allowed them to measure the actual effectiveness of their data extraction attack, proving it could compromise specific, high-value information.

This work reinforces our belief that progress in AI security is a community effort. By providing robust tools for benchmarking, we can collectively identify weaknesses and build stronger, more resilient systems. A huge congratulations to the authors on this important contribution.

🔗 Read the full paper: https://arxiv.org/html/2408.17354v1

#OpenSource #DataPrivacy #LLM #Anonymization #AIsecurity #HuggingFace #Ai4Privacy #World's largest open privacy masking dataset

MikeDoes

posted an update 13 days ago

Post

1124

In data privacy, 92% accuracy is not an A-grade. Privacy AI needs to be better.

That's the stark takeaway from a recent benchmark by Diego Mouriño

(Making Science), who put today's top PII detection methods to the test on call center transcripts using the Ai4Privacy dataset.

They pitted cutting-edge LLMs (like GPT-4 & Gemini) against traditional systems (like Cloud DLPs). The results show that our trust in these tools might be misplaced.

📊 The Hard Numbers:

Even top-tier LLMs peaked at a reported 92% accuracy, leaving a potential dangerous 8% gap where your customer's data can leak. They particularly struggled with basics like 'last names' and 'street addresses'.

The old guard? Traditional rule-based systems reportedly achieved a shocking 50% accuracy. A coin toss with your customers' privacy.

This tells us that for privacy tasks, off-the-shelf accuracy is a vanity metric. The real metric is the cost of a single failure—one leaked name, one exposed address.

While no tool is perfect, some are better than others. Diego’s full analysis breaks down which models offer the best cost-to-accuracy balance in this flawed landscape. It's a must-read for anyone serious about building trustworthy AI.

#DataPrivacy #AI #LLM #RiskManagement #MetricsThatMatter #InfoSec

Find the full post here:
https://www.makingscience.com/blog/protecting-customer-privacy-how-to-remove-pii-from-call-center-transcripts/

Dataset:
ai4privacy/pii-masking-400k

MikeDoes

posted an update about 2 months ago

Post

2707

Started

aistatuscodes as a new project to create codes to understand AI performance better.

Going to be posting daily here and on instagram until we get to 100m downloads :)
https://www.instagram.com/MikeDoesDo/

Follow along the journey!

DeepMount00

in ai4privacy/llama-ai4privacy-multilingual-categorical-anonymiser-openpii about 2 months ago

Cat

#1 opened about 2 months ago by

Qadari

MikeDoes

in ai4privacy/llama-ai4privacy-english-anonymiser-openpii about 2 months ago

Base model

#3 opened about 2 months ago by

IICurious

MikeDoes

updated a model about 2 months ago

ai4privacy/llama-ai4privacy-english-anonymiser-openpii

Token Classification • 0.1B • Updated Jun 5 • 264 • • 15

MikeDoes

updated 2 collections 3 months ago

Entreprise PII Masking

Collection

5 items • Updated May 15 • 2

PII-Masking-400k and below

Collection

6 items • Updated May 15 • 2

MikeDoes

in ai4privacy/llama-ai4privacy-english-anonymiser-openpii 3 months ago

model does not return detailed categories

#2 opened 3 months ago by

AymanChtiar

MikeDoes

posted an update 3 months ago

Post

1545

PII-Masking-1M Final Day (7/7)! 🚀 Today, we unveil 5 NEW Enterprise PII (E-PII) Dataset PREVIEWS!

Standard PII tools often miss sensitive *business* data. That's why we built E-PII previews for the data that powers your operations and compliance needs.

Get a first look (representing 100,000 samples each!) into datasets designed for real-world enterprise security across these categories:

🏥 **PHI Preview**: For Healthcare Data
💳 **PFI Preview:** For Financial Data
🏢 **PWI Preview:** For Workplace Data
💻 **PDI Preview:** For Digital Activity Data
📍 **PLI Preview:** For Location Data

That wraps up our #PIIMasking1M 7 days announcement! HUGE thanks for following along and for your engagement.
Explore ALL our releases, including these E-PII previews, in the Ai4Privacy Hugging Face Collection & show some love ❤️ if you find them useful!
🔗 Visit the Collection:https://huggingface.co/ai4privacy

Let's keep building safer AI, together!

MikeDoes

updated a dataset 3 months ago

ai4privacy/pli-masking-100k

Viewer • Updated Apr 28 • 400 • 54 • 1

AI & ML interests

Recent Activity

Team members 4

ai4privacy's activity

Cat

Base model

model does not return detailed categories