This repository contains a curated list of research papers and resources focusing on image captioning evaluation.
❗ Latest Update: 18 March 2025. ❗This repo is a work in progress. New updates coming soon, stay tuned!! 🚧
We leverage publicly available codes and have designed a unified framework that enables the reproduction of all metrics within a single repository.
🔜 Coming soon! 🔜
Authors: Sara Sarto, Marcella Cornia, Rita Cucchiara
Please cite with the following BibTeX:
@inproceedings{sarto2025image,
title={{Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives}},
author={Sarto, Sara and Cornia, Marcella and Cucchiara, Rita},
booktitle={arxiv}
year={2025}
}
-
The Evolution of Captioning Metrics
-
Rule-based Metrics
Year Conference / Journal Title Authors Links 2002 ACL BLEU: A method for automatic evaluation of machine translation Kishore Papineni et al. 📜 Paper 2004 ACLW ROUGE: A package for automatic evaluation of summaries Chin-Yew Lin 📜 Paper 2005 ACLW METEOR: An automatic metric for MT evaluation with improved correlation with human judgments Satanjeev Banerjee et al. 📜 Paper 2015 CVPR CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam et al. 📜 Paper 2016 ECCV SPICE: Semantic Propositional Image Caption Evaluation Peter Anderson et al. 📜 Paper
-
Learnable Metrics
-
Unsupervised Metrics
Year Conference / Journal Title Authors Links 2019 EMNLP TIGEr: Text-to-Image Grounding for Image Caption Evaluation Ming Jiang et al. 📜 Paper 2020 ICLR BERTScore: Evaluating Text Generation with BERT Tianyi Zhang et al. 📜 Paper 2020 ACL Improving Image Captioning Evaluation by Considering Inter References Variance Yanzhi Yi et al. 📜 Paper 2020 EMNLP ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT Hwanhee Lee et al. 📜 Paper 2021 EMNLP CLIPScore: A Reference-free Evaluation Metric for Image Captioning Jack Hessel et al. 📜 Paper 2021 CVPR FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation Sijin Wang et al. 📜 Paper 2021 ACL UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning Hwanhee Lee et al. 📜 Paper 2022 NeurIPS Mutual Information Divergence: A Unified Metric for Multimodal Generative Models Jin-Hwa Kim et al. 📜 Paper 2023 CVPR PAC-S: Improving CLIP for Image Caption Evaluation via Positive Augmentations Sara Sarto et al. 📜 Paper
-
Fine-grained Oriented Metrics
Year Conference / Journal Title Authors Links 2023 ACL InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation Anwen Hu et al. 📜 Paper 2024 ACM MM HICEScore: A Hierarchical Metric for Image Captioning Evaluation Zequn Zeng et al. 📜 Paper 2024 ECCV BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues Sara Sarto et al. 📜 Paper 2024 ECCV HiFi-Score: Fine-Grained Image Description Evaluation with Hierarchical Parsing Graphs Ziwei Yao et al. 📜 Paper
-
-
LLMs-based Metrics
Year Conference / Journal Title Authors Links 2023 EMNLP CLAIR: Evaluating Image Captions with Large Language Models David Chan et al. 📜 Paper 2024 ACL FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model Yebin Lee et al. 📜 Paper
-
-
Datasets & Benchmarks 📂📎
-
Correlation with Human Judgment
-
Pairwise Ranking
-
Sensitivity to Object Hallucinations
-
- Fork this repository and clone it locally.
- Create a new branch for your changes:
git checkout -b feature-name
. - Make your changes and commit them:
git commit -m 'Description of the changes'
. - Push to your fork:
git push origin feature-name
. - Open a pull request on the original repository by providing a description of your changes.
This project is in constant development, and we welcome contributions to include the latest research papers in the field or report issues 💥.