#

audio-visual-learning

Here are 40 public repositories matching this topic...

ali-vilab / dreamtalk

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

video-generation face-animation audio-visual-learning talking-head

Updated Jan 15, 2024
Python

tanshuai0219 / EDTalk

[ECCV 2024 Oral] EDTalk - Official PyTorch Implementation

video-generation face-animation audio-visual-learning talking-head talking-face-generation

Updated Dec 31, 2024
Python

OpenNLPLab / AVSBench

[ECCV 2022] & [IJCV 2024] Official implementation of the paper: Audio-Visual Segmentation (with Semantics)

audio-visual-learning segmentation-benchmark audio-visual-segmentation multi-modal-segmentation

Updated Nov 18, 2024
Python

xid32 / NAACL_2025_TWM

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.

question-answering video-captioning working-memory audio-visual-learning video-text-retrieval multimodal-large-language-models multimodal-foundation-model

Updated Jan 26, 2025
Python

YapengTian / AVE-ECCV18

Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018

eccv-2018 audio-visual-learning audio-visual-events ave-dataset

Updated Apr 3, 2021
Python

alvinliu0 / HA2G

[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"

co-speech-gesture audio-visual-learning cvpr2022

Updated Mar 16, 2023
Python

rhgao / co-separation

Co-Separating Sounds of Visual Objects (ICCV 2019)

sound-separation cross-modality audio-visual-learning

Updated Jul 25, 2023
Python

yanbeic / CCL

PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

pytorch video-recognition distillation audio-visual-learning contrastive-learning cvpr2021 compositional-contrastive-learning audio-teacher-models multi-modal-distillation

Updated Jul 7, 2021
Python

ttgeng233 / UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

multi-modal-learning audio-visual-learning audio-visual-events

Updated Feb 12, 2024
Python

roger-tseng / av-superb

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)

representation-learning audio-visual-learning

Updated Apr 17, 2024
Python

jasongief / PSP_CVPR_2021

[2021 CVPR] Positive Sample Propagation along the Audio-Visual Event Line

audio-visual-learning audio-visual-events

Updated Jul 5, 2022
Python

praveena2j / JointCrossAttentional-AV-Fusion

ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

emotion affective-computing emotion-recognition attention-model multimodal-learning audio-visual-learning

Updated Jan 15, 2024
Python

praveena2j / Joint-Cross-Attention-for-Audio-Visual-Fusion

IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"

attention affective-computing emotion-recognition attention-model multimodal-learning audio-visual-learning

Updated Nov 29, 2024
Python

stoneMo / AVGN

Official implementation for AVGN

weakly-supervised-learning audio-visual-correspondence audio-visual-learning visual-sound-localization

Updated Mar 24, 2023
Python

MengyuanChen21 / CVPR2023-CMPAE

[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

video-understanding audio-visual audio-visual-learning cvpr2023 audio-visual-video-parsing

Updated Jun 17, 2023
Python

kyuyeonpooh / objects-that-sound

The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.

deep-learning audioset sound-localization eccv2018 cross-modal-retrieval audio-visual-learning

Updated Jan 29, 2024
Python

stoneMo / EZ-VSL

Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)

self-supervised-learning audio-visual-correspondence audio-visual-learning visual-sound-localization

Updated Oct 2, 2022
Python

stoneMo / DeepAVFusion

Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".

attention-mechanism multimodal-learning self-supervised-learning sound-source-localization transformer-architecture audio-visual-correspondence audio-visual-learning masked-autoencoder sound-source-separation masked-image-modeling

Updated Aug 2, 2024
Python

jasongief / CPSP

[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line

audio-visual-learning audio-visual-events audio-visual-parsing

Updated Mar 6, 2023
Python

praveena2j / Cross-Attentional-AV-Fusion

FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition

emotion affective-computing emotion-recognition attention-model multimodal-learning audio-visual-learning

Updated Nov 29, 2024
Python

Improve this page

Add a description, image, and links to the audio-visual-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the audio-visual-learning topic, visit your repo's landing page and select "manage topics."