Image Captioning Evaluation

This repository contains a curated list of research papers and resources focusing on image captioning evaluation.

❗ Latest Update: 18 March 2025. ❗This repo is a work in progress. New updates coming soon, stay tuned!! 🚧

👩‍💻 🔜 Code for Reproducing Metric Scores

We leverage publicly available codes and have designed a unified framework that enables the reproduction of all metrics within a single repository.

🔜 Coming soon! 🔜

🔥🔥 Our Survey

Image Captioning Evaluation in the Age of Multimodal LLMs:
Challenges and Future Perspectives

Authors: Sara Sarto, Marcella Cornia, Rita Cucchiara

Please cite with the following BibTeX:

@inproceedings{sarto2025image,
  title={{Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives}},
  author={Sarto, Sara and Cornia, Marcella and Cucchiara, Rita},
  booktitle={arxiv}
  year={2025}
}

📚 Table of Contents

The Evolution of Captioning Metrics

Rule-based Metrics

Year	Conference / Journal	Title	Authors	Links
2002	ACL	BLEU: A method for automatic evaluation of machine translation	Kishore Papineni et al.	📜 Paper
2004	ACLW	ROUGE: A package for automatic evaluation of summaries	Chin-Yew Lin	📜 Paper
2005	ACLW	METEOR: An automatic metric for MT evaluation with improved correlation with human judgments	Satanjeev Banerjee et al.	📜 Paper
2015	CVPR	CIDEr: Consensus-based Image Description Evaluation	Ramakrishna Vedantam et al.	📜 Paper
2016	ECCV	SPICE: Semantic Propositional Image Caption Evaluation	Peter Anderson et al.	📜 Paper

Learnable Metrics

Unsupervised Metrics

Year	Conference / Journal	Title	Authors	Links
2019	EMNLP	TIGEr: Text-to-Image Grounding for Image Caption Evaluation	Ming Jiang et al.	📜 Paper
2020	ICLR	BERTScore: Evaluating Text Generation with BERT	Tianyi Zhang et al.	📜 Paper
2020	ACL	Improving Image Captioning Evaluation by Considering Inter References Variance	Yanzhi Yi et al.	📜 Paper
2020	EMNLP	ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT	Hwanhee Lee et al.	📜 Paper
2021	EMNLP	CLIPScore: A Reference-free Evaluation Metric for Image Captioning	Jack Hessel et al.	📜 Paper
2021	CVPR	FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation	Sijin Wang et al.	📜 Paper
2021	ACL	UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning	Hwanhee Lee et al.	📜 Paper
2022	NeurIPS	Mutual Information Divergence: A Unified Metric for Multimodal Generative Models	Jin-Hwa Kim et al.	📜 Paper
2023	CVPR	PAC-S: Improving CLIP for Image Caption Evaluation via Positive Augmentations	Sara Sarto et al.	📜 Paper

Supervised Metrics

Year	Conference / Journal	Title	Authors	Links
2024	CVPR	Polos: Multimodal Metric Learning from Human Feedback for Image Captioning	Yuiga Wada et al.	📜 Paper
2024	ACCV	DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning	Kazuki Matsuda et al.	📜 Paper

Fine-grained Oriented Metrics

Year	Conference / Journal	Title	Authors	Links
2023	ACL	InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation	Anwen Hu et al.	📜 Paper
2024	ACM MM	HICEScore: A Hierarchical Metric for Image Captioning Evaluation	Zequn Zeng et al.	📜 Paper
2024	ECCV	BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues	Sara Sarto et al.	📜 Paper
2024	ECCV	HiFi-Score: Fine-Grained Image Description Evaluation with Hierarchical Parsing Graphs	Ziwei Yao et al.	📜 Paper

LLMs-based Metrics

Year	Conference / Journal	Title	Authors	Links
2023	EMNLP	CLAIR: Evaluating Image Captions with Large Language Models	David Chan et al.	📜 Paper
2024	ACL	FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model	Yebin Lee et al.	📜 Paper

Hallucinations-based Metrics

Year	Conference / Journal	Title	Authors	Links
2018	EMNLP	Object Hallucination in Image Captioning	Anna Rohrbach et al.	📜 Paper
2024	NAACL	ALOHa: A New Measure for Hallucination in Captioning Models	Suzanne Petryk et al.	📜 Paper

Datasets & Benchmarks 📂📎
- Correlation with Human Judgment
- Pairwise Ranking
  - Pascal50-S
- Sensitivity to Object Hallucinations
  - FOIL dataset

How to Contribute 🚀

Fork this repository and clone it locally.
Create a new branch for your changes: git checkout -b feature-name.
Make your changes and commit them: git commit -m 'Description of the changes'.
Push to your fork: git push origin feature-name.
Open a pull request on the original repository by providing a description of your changes.

This project is in constant development, and we welcome contributions to include the latest research papers in the field or report issues 💥.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
qualitatives.png		qualitatives.png
result.jpg		result.jpg
taxonomy.png		taxonomy.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning Evaluation

👩‍💻 🔜 Code for Reproducing Metric Scores

🔥🔥 Our Survey

Image Captioning Evaluation in the Age of Multimodal LLMs:
Challenges and Future Perspectives

📚 Table of Contents

How to Contribute 🚀

About

Releases

Packages

aimagelab/awesome-captioning-evaluation

Folders and files

Latest commit

History

Repository files navigation

Image Captioning Evaluation

👩‍💻 🔜 Code for Reproducing Metric Scores

🔥🔥 Our Survey

Image Captioning Evaluation in the Age of Multimodal LLMs:Challenges and Future Perspectives

📚 Table of Contents

How to Contribute 🚀

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Image Captioning Evaluation in the Age of Multimodal LLMs:
Challenges and Future Perspectives

Packages