Papers
arxiv:2510.18019

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Published on Oct 20
ยท Submitted by Martin Gubri on Oct 22

Abstract

STEAM, a back-translation-based detection method, enhances multilingual watermarking robustness across various languages by addressing semantic clustering failures.

AI-generated summary

Multilingual watermarking aims to make large language model (LLM) outputs traceable across languages, yet current methods still fall short. Despite claims of cross-lingual robustness, they are evaluated only on high-resource languages. We show that existing multilingual watermarking methods are not truly multilingual: they fail to remain robust under translation attacks in medium- and low-resource languages. We trace this failure to semantic clustering, which fails when the tokenizer vocabulary contains too few full-word tokens for a given language. To address this, we introduce STEAM, a back-translation-based detection method that restores watermark strength lost through translation. STEAM is compatible with any watermarking method, robust across different tokenizers and languages, non-invasive, and easily extendable to new languages. With average gains of +0.19 AUC and +40%p TPR@1% on 17 languages, STEAM provides a simple and robust path toward fairer watermarking across diverse languages.

Community

Paper author Paper submitter
โ€ข
edited 3 days ago

Some watermarking methods for large language models (LLMs) claim to be multilingual, yet they are almost always tested on high-resource languages like English, French, and German. This paper reveals that such claims do not hold up under scrutiny: multilingual watermarks collapse under translation attacks in medium- and low-resource languages.

teaser_translation_attack
teaser_plot

This paper traces the issue to semantic clustering, the main technique behind multilingual watermarking, which groups semantically similar tokens across languages. When tokenisers have few full-word tokens, as is common in less-resourced languages, the clustering fails, weakening watermark detection.

To fix this, the paper introduces STEAM (Simple Translation-Enhanced Approach for Multilingual watermarking), a lightweight, detection-time method that restores watermark signals lost during translation. STEAM uses back-translation, translating a suspect text back into multiple supported languages, and then identifies the strongest watermark signal across these variants. Crucially, STEAM is model-agnostic, tokenizer-independent, and works with any existing watermarking method without modifying model outputs.

steam_v3.drawio

๐Ÿ“ˆ Results

  • Evaluated on 17 languages spanning high-, medium-, and low-resource groups.
  • Achieves +0.19 AUC and +40%p TPR@1%FPR average improvement over prior multilingual methods.
  • Outperforms semantic clustering (X-SIR, X-KGW) by up to +0.33 AUC and +64.5%p TPR@1%.
  • Remains robust under translator mismatches and adaptive multi-step translation attacks.

๐Ÿ’ก Key Insight:
Watermark robustness depends on language tokenisation coverage. By recovering lost watermark signals through back-translation, STEAM ensures fairer, more reliable detection across linguistic diversity.

In short, STEAM redefines multilingual watermarking as genuinely multilingual, offering a simple yet powerful step toward equitable content provenance for all languages.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.18019 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.18019 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.18019 in a Space README.md to link it from this page.

Collections including this paper 1