Searching images in a web archive

André Mourão, Daniel Gomes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This article presents the research and development of a large-scale image search system applied to launch a word-wide innovative service that enables searching billions of historical images archived from the web since the 1990s. Contributions of this work were applied to enhance the Arquivo.pt web archive with an image-search service where users submit text queries, through a web user interface or an API, and immediately receive a list of historical web-archived images. However, supporting image search over web archives raised new challenges. The volume of data to be processed was big and heterogeneous, summing over 530TB of historical web data published since the early days of the web. The main contributions of this work are a toolkit of algorithms that extracts textual metadata to describe web-archived images, a system architecture and workflow to index large amounts of web-archived images considering their specific temporal features and a ranking algorithm to order image-search results by relevance. This research was applied to launch an enhanced image-search service that is publicly available since March 2021. All the developed software is fully available as free open-source software.
Original languageEnglish
Title of host publication2023 IEEE 10th International Conference on Data Science and Advanced Analytics, DSAA 2023 - Proceedings
EditorsYannis Manolopoulos, Zhi-Hua Zhou
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages10
ISBN (Electronic)979-8-3503-4503-2
ISBN (Print)979-8-3503-4504-9
DOIs
Publication statusPublished - 2023
Event10th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2023 - Thessaloniki, Greece
Duration: 9 Oct 202312 Oct 2023

Conference

Conference10th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2023
Country/TerritoryGreece
CityThessaloniki
Period9/10/2312/10/23

Keywords

  • Image search
  • web archive
  • web archive information retrieval

Fingerprint

Dive into the research topics of 'Searching images in a web archive'. Together they form a unique fingerprint.

Cite this