DetectAnyLLM

Abstract

The rapid advancement of large language models (LLMs) has blurred the boundary between human-written and machine-generated text, drawing urgent attention to the task of machine-generated text detection (MGTD). However, existing approaches struggle in complex real-world scenarios: zero-shot detectors rely heavily on scoring model's output distribution while training-based detectors are often constrained by overfitting to the training data, limiting generalization.

We found that the performance bottleneck of training-based detectors stems from the misalignment between training objective and task needs, i.e., optimizing for token distribution rather than the MGTD task itself. To address this issue, we propose Direct Discrepancy Learning (DDL), a novel optimization strategy that directly optimize the scoring model with task-oriented knowledge. DDL enables the scoring model to better capture the core semantics of the detection task, thereby enhancing both robustness and generalization. Built upon this approach, we introduce DetectAnyLLM, a unified detection framework that achieves state-of-the-art MGTD performance across diverse LLMs.

To ensure a robust and reliable evaluation, we construct MIRAGE, the most diverse multi-task MGTD benchmark. MIRAGE samples human-written texts from 10 corpora across 5 domains, which are then regenerated or revised using 17 cutting-edge LLMs, covering a wide spectrum of commercial models and textual styles. Extensive experiments on MIRAGE reveal the limitations of existing methods in complex environment. In contrast, DetectAnyLLM consistently outperforms them, achieving over a 70% performance improvement under the same training data and base scoring model, and thus underscores the effectiveness of our DDL.

Method

While prior methods use optimization techniques like Direct Preference Optimization (DPO), they include a KL-regularization term that forces the detection model to retain its original language modeling abilities. We argue this is counter-productive, as it shifts the training objective away from learning to be an effective detector.

To address this, we propose Direct Discrepancy Learning (DDL). By removing the redundant KL-regularization, we allow the scoring model to focus solely on learning task-oriented knowledge for detection. Our optimization goal is to directly maximize the discrepancy for machine-generated text (MGT) and minimize it for human-written text (HWT).

The DDL optimization objective is formulated as:
\[\min_\theta \mathbb{E}_{x_m, x_h \sim \mathcal{D}}(\Vert d_c(x_h, f_\theta, f_\theta)\Vert_1 + \Vert \gamma - d_c(x_m, f_\theta, f_\theta)\Vert_1).\]
This equation trains the model \(f_\theta\) to produce a low discrepancy score \(d_c\) (near 0) for HWT (\(x_h\)) and a high score (near \(\gamma\)) for MGT (\(x_m\)). DDL enables the scoring model to directly learn the difference between MGT and HWT, significantly boosting robustness and generalization. Finally, a Reference Clustering step is used to classify texts based on their scores.

Benchmark

To address the limitations of existing benchmarks, which often lack diversity in domains, source LLMs, and tasks, we introduce MIRAGE (Multi-domain Inclusive Realistic Assessment for machine Generated text dEtection). MIRAGE is the most comprehensive multi-task MGTD evaluation framework to date, incorporating text across diverse domains generated or revised by 17 state-of-the-art LLMs (13 proprietary, 4 open-source). This rigorous process, involving multi-domain sampling, inclusive tasks, realistic scenarios, and style diversification, ensures MIRAGE provides a robust and realistic benchmark for evaluating MGT detectors, advancing the development of more generalizable and practical solutions.

Comparison

Benchmark Comparison

Benchmark	Size	Domain Coverage	Corpus	Commercial
TuringBench	40K	News	3
HC3	85K	QA/Comment/Academic	5	1
M4	24.5K	QA/Comment/Academic/News	11	2
MAGE	29K	QA/Comment/News/Academic/Story	10	3
RAID	628.7K	News/Academic/Comment/Literature	11	3
DetectRL	134.4K	Academic/Comment	4	2
HART	16K	News/Literature/Academic	4	4
MIRAGE (ours)	93.8K	Academic/Comment/Email/News/Website	10	13

MIRAGE offers the broadest domain and task coverage and leverages commercial LLMs, making it a stronger, more realistic MGTD benchmark.

Performance on MIRAGE

MIRAGE-DIG
Methods	Generate				Polish				Rewrite
Methods	AUROC	Acc.	MCC	TPR@5%	AUROC	Acc.	MCC	TPR@5%	AUROC	Acc.	MCC	TPR@5%
Fast-DetectGPT	0.7768	0.7234	0.4628	0.4310	0.5720	0.5570	0.1293	0.1189	0.5455	0.5432	0.1015	0.1025
ImBD	0.8597	0.7738	0.5497	0.4065	0.7888	0.7148	0.4300	0.2730	0.7825	0.7068	0.4139	0.2933
DetectAnyLLM (ours)	0.9525	0.8988	0.7975	0.7770	0.9297	0.8732	0.7487	0.7756	0.9234	0.8705	0.7447	0.7778

MIRAGE-SIG
Methods	Generate				Polish				Rewrite
Methods	AUROC	Acc.	MCC	TPR@5%	AUROC	Acc.	MCC	TPR@5%	AUROC	Acc.	MCC	TPR@5%
Fast-DetectGPT	0.7706	0.7193	0.2078	0.4200	0.5727	0.5619	0.0607	0.1238	0.5480	0.5495	0.0525	0.1097
ImBD	0.8612	0.7791	0.5599	0.4183	0.7951	0.7199	0.4451	0.3036	0.7694	0.6920	0.3936	0.2868
DetectAnyLLM (ours)	0.9526	0.9059	0.8119	0.7722	0.9316	0.8740	0.7483	0.7779	0.9158	0.8643	0.7320	0.7574

DetectAnyLLM vastly outperforms prior methods on MIRAGE under both DIG and SIG settings across all tasks.

Open Source Resources

GitHub Repository

Complete implementation code, pre-trained models and usage examples

Datasets

Download links for training and evaluation datasets

Pre-trained Models

DetectAnyLLM model checkpoint and backbone

Download Models

Contact

Feel free to contact us at:

guochunle@nankai.edu.cn

fjc@mail.nankai.edu.cn

For commercial licensing, please contact:

lichongyi@nankai.edu.cn

Citation

BibTeX

@inproceedings{fu2025detectanyllm,
    title        = {DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models},
    author       = {Fu, Jiachen and Guo, Chun-Le and Li, Chongyi},
    year         = 2025,
    booktitle    = {the 33rd ACM International Conference on Multimedia},
    address      = {Dublin, Ireland},
    organization = {ACM}
}

Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models