DetectAnyLLM

Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models

VCIP, CS, Nankai University

*Corresponding Author      Project Lead

DetectAnyLLM - Teaser

Left: Our DetectAnyLLM achieves high efficiency, strong robustness, and impressive generalization through a three-step process: sampling perturbation, calculating discrepancy, and reference clustering. Right: Our MIRAGE benchmark emphasizes diversity across domains, tasks, evaluation scenarios, and source LLMs, enabling comprehensive and robust evaluation.

Abstract

The rapid advancement of large language models (LLMs) has blurred the boundary between human-written and machine-generated text, drawing urgent attention to the task of machine-generated text detection (MGTD). However, existing approaches struggle in complex real-world scenarios: zero-shot detectors rely heavily on scoring model's output distribution while training-based detectors are often constrained by overfitting to the training data, limiting generalization.

We found that the performance bottleneck of training-based detectors stems from the misalignment between training objective and task needs, i.e., optimizing for token distribution rather than the MGTD task itself. To address this issue, we propose Direct Discrepancy Learning (DDL), a novel optimization strategy that directly optimize the scoring model with task-oriented knowledge. DDL enables the scoring model to better capture the core semantics of the detection task, thereby enhancing both robustness and generalization. Built upon this approach, we introduce DetectAnyLLM, a unified detection framework that achieves state-of-the-art MGTD performance across diverse LLMs.

To ensure a robust and reliable evaluation, we construct MIRAGE, the most diverse multi-task MGTD benchmark. MIRAGE samples human-written texts from 10 corpora across 5 domains, which are then regenerated or revised using 17 cutting-edge LLMs, covering a wide spectrum of commercial models and textual styles. Extensive experiments on MIRAGE reveal the limitations of existing methods in complex environment. In contrast, DetectAnyLLM consistently outperforms them, achieving over a 70% performance improvement under the same training data and base scoring model, and thus underscores the effectiveness of our DDL.

Method

Direct Discrepancy Learning

While prior methods use optimization techniques like Direct Preference Optimization (DPO), they include a KL-regularization term that forces the detection model to retain its original language modeling abilities. We argue this is counter-productive, as it shifts the training objective away from learning to be an effective detector.

To address this, we propose Direct Discrepancy Learning (DDL). By removing the redundant KL-regularization, we allow the scoring model to focus solely on learning task-oriented knowledge for detection. Our optimization goal is to directly maximize the discrepancy for machine-generated text (MGT) and minimize it for human-written text (HWT).

The DDL optimization objective is formulated as:
\[\min_\theta \mathbb{E}_{x_m, x_h \sim \mathcal{D}}(\Vert d_c(x_h, f_\theta, f_\theta)\Vert_1 + \Vert \gamma - d_c(x_m, f_\theta, f_\theta)\Vert_1).\]
This equation trains the model \(f_\theta\) to produce a low discrepancy score \(d_c\) (near 0) for HWT (\(x_h\)) and a high score (near \(\gamma\)) for MGT (\(x_m\)). DDL enables the scoring model to directly learn the difference between MGT and HWT, significantly boosting robustness and generalization. Finally, a Reference Clustering step is used to classify texts based on their scores.

Benchmark

MIRAGE Benchmark Workflow

To address the limitations of existing benchmarks, which often lack diversity in domains, source LLMs, and tasks, we introduce MIRAGE (Multi-domain Inclusive Realistic Assessment for machine Generated text dEtection). MIRAGE is the most comprehensive multi-task MGTD evaluation framework to date, incorporating text across diverse domains generated or revised by 17 state-of-the-art LLMs (13 proprietary, 4 open-source). This rigorous process, involving multi-domain sampling, inclusive tasks, realistic scenarios, and style diversification, ensures MIRAGE provides a robust and realistic benchmark for evaluating MGT detectors, advancing the development of more generalizable and practical solutions.

Comparison

Benchmark Comparison

Benchmark Size Domain Coverage Corpus Commercial Generate Polish Rewrite Aug. SIG DIG
TuringBench 40K News 3
HC3 85K QA/Comment/Academic 5 1
M4 24.5K QA/Comment/Academic/News 11 2
MAGE 29K QA/Comment/News/Academic/Story 10 3
RAID 628.7K News/Academic/Comment/Literature 11 3
DetectRL 134.4K Academic/Comment 4 2
HART 16K News/Literature/Academic 4 4
MIRAGE (ours) 93.8K Academic/Comment/Email/News/Website 10 13

MIRAGE offers the broadest domain and task coverage and leverages commercial LLMs, making it a stronger, more realistic MGTD benchmark.

Performance on MIRAGE

MIRAGE-DIG
Methods Generate Polish Rewrite
AUROCAcc.MCCTPR@5% AUROCAcc.MCCTPR@5% AUROCAcc.MCCTPR@5%
Fast-DetectGPT 0.77680.72340.46280.4310 0.57200.55700.12930.1189 0.54550.54320.10150.1025
ImBD 0.85970.77380.54970.4065 0.78880.71480.43000.2730 0.78250.70680.41390.2933
DetectAnyLLM (ours) 0.95250.89880.79750.7770 0.92970.87320.74870.7756 0.92340.87050.74470.7778
MIRAGE-SIG
Methods Generate Polish Rewrite
AUROCAcc.MCCTPR@5% AUROCAcc.MCCTPR@5% AUROCAcc.MCCTPR@5%
Fast-DetectGPT 0.77060.71930.20780.4200 0.57270.56190.06070.1238 0.54800.54950.05250.1097
ImBD 0.86120.77910.55990.4183 0.79510.71990.44510.3036 0.76940.69200.39360.2868
DetectAnyLLM (ours) 0.95260.90590.81190.7722 0.93160.87400.74830.7779 0.91580.86430.73200.7574

DetectAnyLLM vastly outperforms prior methods on MIRAGE under both DIG and SIG settings across all tasks.

Open Source Resources

GitHub Repository

Complete implementation code, pre-trained models and usage examples

Datasets

Download links for training and evaluation datasets

Pre-trained Models

DetectAnyLLM model checkpoint and backbone

Download Models

Contact

Feel free to contact us at:

guochunle@nankai.edu.cn

fjc@mail.nankai.edu.cn

For commercial licensing, please contact:

lichongyi@nankai.edu.cn

License

Pi-Lab License 1.0

Copyright 2025 Pi-Lab

Citation

BibTeX
@inproceedings{fu2025detectanyllm,
    title        = {DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models},
    author       = {Fu, Jiachen and Guo, Chun-Le and Li, Chongyi},
    year         = 2025,
    booktitle    = {the 33rd ACM International Conference on Multimedia},
    address      = {Dublin, Ireland},
    organization = {ACM}
}