The Hidden Audio Bias Inside Audio-Visual Speech Recognition

Wait 5 sec.

Shapley analysis reveals why AVSR models keep trusting corrupted audio, exposing a hidden bias in multimodal speech recognition.