Multi-modal video retrieval

Rossetto, Luca. Multi-modal video retrieval. 2018, Doctoral Thesis, University of Basel, Faculty of Science.

Preview

PDF
Available under License CC BY-NC-ND (Attribution-NonCommercial-NoDerivatives).
116Mb

Official URL: http://edoc.unibas.ch/diss/DissB_12898

Abstract

All multimedia content, but especially video, has in recent years grown in both volume and importance. In order for this increasing amount of video material to be useful, it is important to be able to find the parts of it which are relevant to any given circumstance. The field of video retrieval works on addressing this challenge by offering means to retrieve video sequences from a larger pool which are similar to a query. Such retrieval processes commonly rely on textual annotations which often need to be added manually to a video in order to make it retrievable. In contrast, content-based video retrieval operates not on such external metadata but rather on the content of a video itself.
The aim of this thesis to make several contributions to the field of content-based video retrieval. It begins with an analysis of one of the largest and most diverse contemporary sources of video material - web video - as it is found in the wild. The analysis outlines several properties of such video material obtained from two large online video platforms and compares them with the properties of several video collections which are commonly used in research. The results of this comparison led to the creation of a new research video dataset which is scheduled to be used for multiple large video retrieval evaluation campaigns.
Next, the notion of similarity, especially in the visual domain, is explored as it is perceived by humans. A human-labelled ground-truth dataset of pair-wise image similarity is obtained through an online platform which made use of both crowdsourcing and gamification strategies for input acquisition. This dataset serves as the basis for a number of experiments which aim at exploring the interrelation between the multitude of options to compute the distance between two features describing visual content and the humanly perceived visual similarity. The insights gained from these experiments might help to support the decision on which distances to use when implementing content-based retrieval systems.
Finally, a content-based video retrieval engine is implemented which supports multiple modalities for query expression. This engine - which goes by the name Cineast - forms a vital component of the content-based retrieval stack vitrivr which has been made publicly available as open-source software. Cineast, and by extension vitrivr, has later been extended to concurrently support multiple media types besides video, such as images, audio and three dimensional models. This makes vitrivr a full-fledged content-based multimedia retrieval stack.

Advisors:	Schuldt, Heiko and Schöffman, Klaus
Faculties and Departments:	05 Faculty of Science > Departement Mathematik und Informatik > Informatik > Databases and Information Systems (Schuldt)
UniBasel Contributors:	Rossetto, Luca and Schuldt, Heiko
Item Type:	Thesis
Thesis Subtype:	Doctoral Thesis
Thesis no:	12898
Thesis status:	Complete
Number of Pages:	1 Online-Ressource (xviii, 216 Seiten)
Language:	English
Identification Number:	doi: 10.5451/unibas-006859522 urn: urn:nbn:ch:bel-bau-diss128988
edoc DOI:	10.5451/unibas-006859522
Last Modified:	11 Feb 2019 12:36
Deposited On:	28 Dec 2018 10:44

Repository Staff Only: item control page