Skip to content
#

audio-visual-learning

Here are 40 public repositories matching this topic...

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.

  • Updated Jan 26, 2025
  • Python

Improve this page

Add a description, image, and links to the audio-visual-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the audio-visual-learning topic, visit your repo's landing page and select "manage topics."

Learn more