ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark
Abstract
A benchmark for Chinese harmful content detection is introduced, along with a knowledge-augmented model that enhances efficiency and accuracy using human-annotated rules and LLMs.
Large language models (LLMs) have been increasingly applied to automated harmful content detection tasks, assisting moderators in identifying policy violations and improving the overall efficiency and accuracy of content review. However, existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope. We present a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data. Our annotation process further yields a knowledge rule base that provides explicit expert knowledge to assist LLMs in Chinese harmful content detection. In addition, we propose a knowledge-augmented baseline that integrates both human-annotated knowledge rules and implicit knowledge from large language models, enabling smaller models to achieve performance comparable to state-of-the-art LLMs. Code and data are available at https://github.com/zjunlp/ChineseHarm-bench.
Community
We present ChineseHarm-Bench, a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CVC: A Large-Scale Chinese Value Rule Corpus for Value Alignment of Large Language Models (2025)
- MTCMB: A Multi-Task Benchmark Framework for Evaluating LLMs on Knowledge, Reasoning, and Safety in Traditional Chinese Medicine (2025)
- Mixture of Small and Large Models for Chinese Spelling Check (2025)
- SafeTuneBed: A Toolkit for Benchmarking LLM Safety Alignment in Fine-Tuning (2025)
- From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring (2025)
- Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs (2025)
- EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 3
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper