arxiv:2510.18019

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Published on Oct 20

· Submitted by

Martin Gubri on Oct 22

Parameter Lab

Upvote

Authors:

Asim Mohamed ,

Martin Gubri

Abstract

STEAM, a back-translation-based detection method, enhances multilingual watermarking robustness across various languages by addressing semantic clustering failures.

AI-generated summary

Multilingual watermarking aims to make large language model (LLM) outputs traceable across languages, yet current methods still fall short. Despite claims of cross-lingual robustness, they are evaluated only on high-resource languages. We show that existing multilingual watermarking methods are not truly multilingual: they fail to remain robust under translation attacks in medium- and low-resource languages. We trace this failure to semantic clustering, which fails when the tokenizer vocabulary contains too few full-word tokens for a given language. To address this, we introduce STEAM, a back-translation-based detection method that restores watermark strength lost through translation. STEAM is compatible with any watermarking method, robust across different tokenizers and languages, non-invasive, and easily extendable to new languages. With average gains of +0.19 AUC and +40%p TPR@1% on 17 languages, STEAM provides a simple and robust path toward fairer watermarking across diverse languages.

View arXiv page View PDF GitHub 1 Add to collection

Community

mgubri

Paper author Paper submitter 3 days ago

•

edited 3 days ago

Some watermarking methods for large language models (LLMs) claim to be multilingual, yet they are almost always tested on high-resource languages like English, French, and German. This paper reveals that such claims do not hold up under scrutiny: multilingual watermarks collapse under translation attacks in medium- and low-resource languages.

This paper traces the issue to semantic clustering, the main technique behind multilingual watermarking, which groups semantically similar tokens across languages. When tokenisers have few full-word tokens, as is common in less-resourced languages, the clustering fails, weakening watermark detection.

To fix this, the paper introduces STEAM (Simple Translation-Enhanced Approach for Multilingual watermarking), a lightweight, detection-time method that restores watermark signals lost during translation. STEAM uses back-translation, translating a suspect text back into multiple supported languages, and then identifies the strongest watermark signal across these variants. Crucially, STEAM is model-agnostic, tokenizer-independent, and works with any existing watermarking method without modifying model outputs.

📈 Results

Evaluated on 17 languages spanning high-, medium-, and low-resource groups.
Achieves +0.19 AUC and +40%p TPR@1%FPR average improvement over prior multilingual methods.
Outperforms semantic clustering (X-SIR, X-KGW) by up to +0.33 AUC and +64.5%p TPR@1%.
Remains robust under translator mismatches and adaptive multi-step translation attacks.

💡 Key Insight:
Watermark robustness depends on language tokenisation coverage. By recovering lost watermark signals through back-translation, STEAM ensures fairer, more reliable detection across linguistic diversity.

In short, STEAM redefines multilingual watermarking as genuinely multilingual, offering a simple yet powerful step toward equitable content provenance for all languages.