Diachronic cross-modal embeddings

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)
52 Downloads (Pure)

Abstract

Understanding the semantic shifts of multimodal information is only possible with models that capture cross-modal interactions over time. Under this paradigm, a new embedding is needed that structures visual-textual interactions according to the temporal dimension, thus, preserving data's original temporal organisation. This paper introduces a novel diachronic cross-modal embedding (DCM), where cross-modal correlations are represented in embedding space, throughout the temporal dimension, preserving semantic similarity at each instant t. To achieve this, we trained a neural cross-modal architecture, under a novel ranking loss strategy, that for each multimodal instance, enforces neighbour instances' temporal alignment, through subspace structuring constraints based on a temporal alignment window. Experimental results show that our DCM embedding successfully organises instances over time. Quantitative experiments, confirm that DCM is able to preserve semantic cross-modal correlations at each instant t while also providing better alignment capabilities. Qualitative experiments unveil new ways to browse multimodal content and hint that multimodal understanding tasks can benefit from this new embedding.
Original languageEnglish
Title of host publicationMM 2019 - Proceedings of the 27th ACM International Conference on Multimedia
Place of PublicationNew York, NY, USA
PublisherACM - Association for Computing Machinery
Pages2061-2069
ISBN (Print)978-1-4503-6889-6
DOIs
Publication statusPublished - 15 Oct 2019
EventMM '19: The 27th ACM International Conference on Multimedia - Nice, France
Duration: 21 Oct 201925 Oct 2021

Conference

ConferenceMM '19: The 27th ACM International Conference on Multimedia
Country/TerritoryFrance
CityNice
Period21/10/1925/10/21

Fingerprint

Dive into the research topics of 'Diachronic cross-modal embeddings'. Together they form a unique fingerprint.

Cite this