EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation
Published in AAAI 2026 (CCF-A), 2024
EvalMuse-40K is a large-scale, fine-grained benchmark with comprehensive human annotations designed to evaluate text-to-image generation models.
Key Contributions:
- A 40K-scale dataset with fine-grained human annotations covering multiple quality dimensions.
- Fine-tuned BLIP model for human-aligned fine-grained scoring.
- Fair ranking of 20+ state-of-the-art T2I models.
My Contribution: Responsible for the structural image quality evaluation task and contributed to the design of the innovative BLIP fine-tuning strategy.
Venue: AAAI 2026 (CCF-A)
Recommended citation: Han, S., Fan, H., Fu, J., Li, L., Li, T., et al. (2024). EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation. AAAI 2026.
Download Paper
