Generate, Then Retrieve: Addressing Missing Modalities in Multimodal Learning via Generative AI and MoE

Published: 07 Mar 2025, Last Modified: 25 Mar 2025GenAI4Health OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal AI for Healthcare, Mixture-of-Experts
Abstract: In multimodal machine learning, effectively addressing the missing modality scenario is crucial for improving perfor- mance in downstream tasks such as in medical contexts where data may be incomplete. Although some attempts have been made to effectively retrieve embeddings for missing modal- ities, two main bottlenecks remain: the (1) consideration of both intra- and inter-modal context, and the (2) cost of embed- ding selection, where embeddings often lack modality-specific knowledge. In response, we propose MoE-Retriever, a novel framework inspired by the design principles of Sparse Mixture of Experts (SMoE). First, MoE-Retriever define a supporting group for intra-modal inputs, i.e., samples that commonly lack the target modality. This group is formed by selecting samples with complementary modality combina- tions for the target modality. It is then integrated with inter- modal inputs—i.e., inputs from different modalities of a sam- ple—thereby establishing both intra- and inter-modal contexts. These inputs are processed by Multi-Head Attention, gener- ating context-aware embeddings that serve as inputs to the SMoE Router, which automatically selects the most relevant experts, i.e., the embedding candidates to be retrieved. Com- prehensive experiments on both medical and general multi- modal datasets demonstrate the robustness and generalizabil- ity of MoE-Retriever, marking a significant step forward in embedding retrieval methods for incomplete multimodal data.
Submission Number: 44
Loading