Temporal cross-media retrieval with soft-smoothing

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)


Multimedia information have strong temporal correlations that shape the way modalities co-occur over time. In this paper we study the dynamic nature of multimedia and social-media information, where the temporal dimension emerges as a strong source of evidence for learning the temporal correlations across visual and textual modalities. So far, cross-media retrieval models, explored the correlations between different modalities (e.g. text and image) to learn a common subspace, in which semantically similar instances lie in the same neighbourhood. Building on such knowledge, we propose a novel temporal cross-media neural architecture, that departs from standard cross-media methods, by explicitly accounting for the temporal dimension through temporal subspace learning. The model is softly-constrained with temporal and inter-modality constraints that guide the new subspace learning task by favouring temporal correlations between semantically similar and temporally close instances. Experiments on three distinct datasets show that accounting for time turns out to be important for cross-media retrieval. Namely, the proposed method outperforms a set of baselines on the task of temporal cross-media retrieval, demonstrating its effectiveness for performing temporal subspace learning.

Original languageEnglish
Title of host publicationMM 2018 - Proceedings of the 2018 ACM Multimedia Conference
PublisherAssociation for Computing Machinery, Inc
Number of pages9
ISBN (Electronic)9781450356657
Publication statusPublished - 15 Oct 2018
Event26th ACM Multimedia conference, MM 2018 - Seoul, Korea, Republic of
Duration: 22 Oct 201826 Oct 2018


Conference26th ACM Multimedia conference, MM 2018
Country/TerritoryKorea, Republic of


  • Cross-media
  • Multimedia retrieval
  • Temporal cross-media
  • Temporal smoothing


Dive into the research topics of 'Temporal cross-media retrieval with soft-smoothing'. Together they form a unique fingerprint.

Cite this