Cross-modal subspace learning with scheduled adaptive margin constraints

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Citations (Scopus)
27 Downloads (Pure)

Abstract

Cross-modal embeddings, between textual and visual modalities, aim to organise multimodal instances by their semantic correlations. State-of-the-art approaches use maximum-margin methods, based on the hinge-loss, to enforce a constant margin m, to separate projections of multimodal instances from different categories. In this paper, we propose a novel scheduled adaptive maximum-margin (SAM) formulation that infers triplet-specific constraints during training, therefore organising instances by adaptively enforcing inter-category and inter-modality correlations. This is supported by a scheduled adaptive margin function, that is smoothly activated, replacing a static margin by an adaptively inferred one reflecting triplet-specific semantic correlations while accounting for the incremental learning behaviour of neural networks to enforce category cluster formation and enforcement. Experiments on widely used datasets show that our model improved upon state-of-the-art approaches, by achieving a relative improvement of up to approximate to 12.5% over the second best method, thus confirming the effectiveness of our scheduled adaptive margin formulation.
Original languageEnglish
Title of host publicationMM 2019 - Proceedings of the 27th ACM International Conference on Multimedia
Place of PublicationNew York, NY, USA
PublisherACM - Association for Computing Machinery
Pages75-83
ISBN (Print)978-1-4503-6889-6
DOIs
Publication statusPublished - 15 Oct 2019
EventMM '19: The 27th ACM International Conference on Multimedia - Nice, France
Duration: 21 Oct 201925 Oct 2021

Conference

ConferenceMM '19: The 27th ACM International Conference on Multimedia
Country/TerritoryFrance
CityNice
Period21/10/1925/10/21

Fingerprint

Dive into the research topics of 'Cross-modal subspace learning with scheduled adaptive margin constraints'. Together they form a unique fingerprint.

Cite this