Papers
arxiv:2506.05388

taz2024full: Analysing German Newspapers for Gender Bias and Discrimination across Decades

Published on Jun 3
Authors:
,
,
,

Abstract

The taz2024full corpus provides a large dataset of German newspaper articles for analyzing gender representation and other linguistic trends, enabling various research applications in NLP and CSS.

AI-generated summary

Open-access corpora are essential for advancing natural language processing (NLP) and computational social science (CSS). However, large-scale resources for German remain limited, restricting research on linguistic trends and societal issues such as gender bias. We present taz2024full, the largest publicly available corpus of German newspaper articles to date, comprising over 1.8 million texts from taz, spanning 1980 to 2024. As a demonstration of the corpus's utility for bias and discrimination research, we analyse gender representation across four decades of reporting. We find a consistent overrepresentation of men, but also a gradual shift toward more balanced coverage in recent years. Using a scalable, structured analysis pipeline, we provide a foundation for studying actor mentions, sentiment, and linguistic framing in German journalistic texts. The corpus supports a wide range of applications, from diachronic language analysis to critical media studies, and is freely available to foster inclusive and reproducible research in German-language NLP.

Community

The paper title is highly misleading: "German Newspapers" -> no it is only one and politically highly far-left biased newspaper analyzed in the paper.

Addtionally: There are freely available news articles from Tagesschau (see here: https://github.com/bjoernpl/tagesschau) and the crawler script is working.

But focus on discrimination is interesting, because taz is highly pro doxing people, see this article in German.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.05388 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.05388 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.05388 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.