MIVPG and Instance Correlation: Enhanced Multi-Instance Learning

Wait 5 sec.

Table of LinksAbstract and 1 IntroductionRelated Work2.1. Multimodal Learning2.2. Multiple Instance LearningMethodology3.1. Preliminaries and Notations3.2. Relations between Attention-based VPG and MIL3.3. MIVPG for Multiple Visual Inputs3.4. Unveiling Instance Correlation in MIVPG for Enhanced Multi-instance ScenariosExperiments and 4.1. General Setup4.2. Scenario 1: Samples with Single Image4.3. Scenario 2: Samples with Multiple Images, with Each Image as a General Embedding4.4. Scenario 3: Samples with Multiple Images, with Each Image Having Multiple Patches to be Considered and 4.5. Case StudyConclusion and References\Supplementary MaterialA. Detailed Architecture of QFormerB. Proof of PropositionC. More Experiments3.4. Unveiling Instance Correlation in MIVPG for Enhanced Multi-instance Scenarios\ \ \Subsequently, the aggregated low-rank matrix can be reintegrated with the original embeddings, as shown in Equation 9. This low-rank projection effectively reduces the time complexity to O(MM′).\ \Proposition 2. MIVPG, when equipped with the CSA (Correlated Self-Attention) module, continues to fulfill the essential properties of MIL\We prove the proposition 2 in the supplementary B.\In summary, as depicted in Figure 2a, we establish that QFormer falls under the MIL category and is a specialized instance of our proposed MIVPG. The latter extends to visual inputs with multiple dimensions, accounting for instance correlation.\:::infoAuthors:(1) Wenliang Zhong, The University of Texas at Arlington (wxz9204@mavs.uta.edu);(2) Wenyi Wu, Amazon (wenyiwu@amazon.com);(3) Qi Li, Amazon (qlimz@amazon.com);(4) Rob Barton, Amazon (rab@amazon.com);(5) Boxin Du, Amazon (boxin@amazon.com);(6) Shioulin Sam, Amazon (shioulin@amazon.com);(7) Karim Bouyarmane, Amazon (bouykari@amazon.com);(8) Ismail Tutar, Amazon (ismailt@amazon.com);(9) Junzhou Huang, The University of Texas at Arlington (jzhuang@uta.edu).::::::infoThis paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.:::\