Skip to content

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives

Notifications You must be signed in to change notification settings

aimagelab/awesome-captioning-evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Image Captioning Evaluation Awesome

This repository contains a curated list of research papers and resources focusing on image captioning evaluation.

❗ Latest Update: 18 March 2025. ❗This repo is a work in progress. New updates coming soon, stay tuned!! 🚧

👩‍💻 🔜 Code for Reproducing Metric Scores

We leverage publicly available codes and have designed a unified framework that enables the reproduction of all metrics within a single repository.

🔜 Coming soon! 🔜

🔥🔥 Our Survey

Image Captioning Evaluation in the Age of Multimodal LLMs:
Challenges and Future Perspectives

Authors: Sara Sarto, Marcella Cornia, Rita Cucchiara

PyTorch Paper

Please cite with the following BibTeX:

@inproceedings{sarto2025image,
  title={{Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives}},
  author={Sarto, Sara and Cornia, Marcella and Cucchiara, Rita},
  booktitle={arxiv}
  year={2025}
}

📚 Table of Contents

  • The Evolution of Captioning Metrics

    • Rule-based Metrics
      Year Conference / Journal Title Authors Links
      2002 ACL BLEU: A method for automatic evaluation of machine translation Kishore Papineni et al. 📜 Paper
      2004 ACLW ROUGE: A package for automatic evaluation of summaries Chin-Yew Lin 📜 Paper
      2005 ACLW METEOR: An automatic metric for MT evaluation with improved correlation with human judgments Satanjeev Banerjee et al. 📜 Paper
      2015 CVPR CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam et al. 📜 Paper
      2016 ECCV SPICE: Semantic Propositional Image Caption Evaluation Peter Anderson et al. 📜 Paper
    • Learnable Metrics

      • Unsupervised Metrics
        Year Conference / Journal Title Authors Links
        2019 EMNLP TIGEr: Text-to-Image Grounding for Image Caption Evaluation Ming Jiang et al. 📜 Paper
        2020 ICLR BERTScore: Evaluating Text Generation with BERT Tianyi Zhang et al. 📜 Paper
        2020 ACL Improving Image Captioning Evaluation by Considering Inter References Variance Yanzhi Yi et al. 📜 Paper
        2020 EMNLP ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT Hwanhee Lee et al. 📜 Paper
        2021 EMNLP CLIPScore: A Reference-free Evaluation Metric for Image Captioning Jack Hessel et al. 📜 Paper
        2021 CVPR FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation Sijin Wang et al. 📜 Paper
        2021 ACL UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning Hwanhee Lee et al. 📜 Paper
        2022 NeurIPS Mutual Information Divergence: A Unified Metric for Multimodal Generative Models Jin-Hwa Kim et al. 📜 Paper
        2023 CVPR PAC-S: Improving CLIP for Image Caption Evaluation via Positive Augmentations Sara Sarto et al. 📜 Paper
      • Supervised Metrics
        Year Conference / Journal Title Authors Links
        2024 CVPR Polos: Multimodal Metric Learning from Human Feedback for Image Captioning Yuiga Wada et al. 📜 Paper
        2024 ACCV DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning Kazuki Matsuda et al. 📜 Paper
      • Fine-grained Oriented Metrics
        Year Conference / Journal Title Authors Links
        2023 ACL InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation Anwen Hu et al. 📜 Paper
        2024 ACM MM HICEScore: A Hierarchical Metric for Image Captioning Evaluation Zequn Zeng et al. 📜 Paper
        2024 ECCV BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues Sara Sarto et al. 📜 Paper
        2024 ECCV HiFi-Score: Fine-Grained Image Description Evaluation with Hierarchical Parsing Graphs Ziwei Yao et al. 📜 Paper
    • LLMs-based Metrics
      Year Conference / Journal Title Authors Links
      2023 EMNLP CLAIR: Evaluating Image Captions with Large Language Models David Chan et al. 📜 Paper
      2024 ACL FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model Yebin Lee et al. 📜 Paper
    • Hallucinations-based Metrics
      Year Conference / Journal Title Authors Links
      2018 EMNLP Object Hallucination in Image Captioning Anna Rohrbach et al. 📜 Paper
      2024 NAACL ALOHa: A New Measure for Hallucination in Captioning Models Suzanne Petryk et al. 📜 Paper

How to Contribute 🚀

  1. Fork this repository and clone it locally.
  2. Create a new branch for your changes: git checkout -b feature-name.
  3. Make your changes and commit them: git commit -m 'Description of the changes'.
  4. Push to your fork: git push origin feature-name.
  5. Open a pull request on the original repository by providing a description of your changes.

This project is in constant development, and we welcome contributions to include the latest research papers in the field or report issues 💥.

About

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published