pm+mo
training multimodal systems with multiple objectives
Paper: https://ceur-ws.org/Vol-2611/paper5.pdf
Citation: see end of page.
Our paper proposes training a framework for multimodal classification with an additional objective trained using variational inference.
Machine learning systems trained with supervised learning rely on the availability of large-scale labelled data. In the case of multimodal learning, this further requires combined inputs for each entity. In cases where data are limited, variance is high and neural networks learn sampling noise with subsequent impacts on generalisation.
We propose a novel system for multimodal classification that makes use of a second objective to learn representations with variational inference. We go on to demonstrate that a combination of ELBO+KL scaling and L2 regularisation offsets overfitting. On the MM-IMDb benchmark, our best variant outperforms the baseline GMU model (Arevalo et al., (2017)) and versions of the same architecture that exclude the variational inference objective.
Further results and analysis are available in the paper. Please cite our work if you find it useful for your own research:
Jason Armitage, Shramana Thakur, Rishi Tripathi, and Jens Lehmann, and Maria Maleshkova. 2020. Training Multimodal Systems for Classification with Multiple Objectives. Proceedings of the 1st International Workshop on Cross-lingual Event-centric Open Analytics co-located with the 17th Extended Semantic Web Conference (ESWC 2020).
BibTeX:
@inproceedings{armitage2020training,
title={Training Multimodal Systems for Classification with Multiple Objectives},
author={Armitage, Jason and Thakur, Shramana and Tripathi, Rishi and Lehmann, Jens and Maleshkova, Maria},
booktitle={Proceedings of the 1st International Workshop on Cross-lingual Event-centric Open Analytics co-located with the 17th Extended Semantic Web Conference (ESWC 2020)},
year={2020}
}