Gesture Similarity Learning and Retrieval in Large-Scale Real-world Video Collections

Parian-Scherb, Mahnaz. Gesture Similarity Learning and Retrieval in Large-Scale Real-world Video Collections. 2021, Doctoral Thesis, University of Basel, Faculty of Science.


Official URL: https://edoc.unibas.ch/84855/

Downloads: Statistics Overview


Analyzing and understanding gestures plays a key role in our comprehension of communication. Investigating the co-occurrence of gestures and speech is currently a labor-intensive task in linguistics. Although, with advances in natural language processing methods, there have been various contributions in this field, computer vision tools and methods are not prominently used to aid the researchers in analyzing hand and body gestures.
In this thesis, we present different contributions tailored to tackle the challenges in real-world gesture retrieval which is an under-explored field in computer vision. The methods aim to systematically answer the questions of 'when' a gesture was performed and 'who' performed it in a video. Along the way, we develop different components to address various challenges in these videos, such as the presence of multiple persons in the scene, heavily occluded hand gestures and abrupt gesture cuts due to the change of camera angle.
In contrast to the majority of the existing methods developed for gesture recognition, our proposed methods do not rely on the depth modality or sensor signals, which is available in some datasets to aid the identification of gestures. Our vision-based methods are built upon the best practices in learning the representations of complicated actions using Deep Neural Networks. We have conducted a comprehensive analysis to choose the architectures and configurations to extract discriminative spatio-temporal features. These features enable the retrieval pipeline to find the 'similar' hand gestures. We have additionally explored the notion of similarity in the context of hand gestures through field studies and experiments.
Finally, we conduct exhaustive experiments on different benchmarks and to the best of the author's knowledge, run the largest gesture retrieval evaluations using the real-world news footage, the Newscape dataset, which is a collection of more than 400 000 videos with numerous challenging scenes for a retrieval method. The assessed results by experts from the linguistics domain suggest high potential of our proposed method in inter-disciplinary research and studies.
Advisors:Schuldt, Heiko and Dupont, St├ęphane
Committee Members:Roth, Volker and Turner, Mark
Faculties and Departments:05 Faculty of Science > Departement Mathematik und Informatik > Informatik > Databases and Information Systems (Schuldt)
UniBasel Contributors:Schuldt, Heiko and Roth, Volker
Item Type:Thesis
Thesis Subtype:Doctoral Thesis
Thesis no:14410
Thesis status:Complete
Number of Pages:195
Identification Number:
  • urn: urn:nbn:ch:bel-bau-diss144102
edoc DOI:
Last Modified:29 Oct 2021 04:30
Deposited On:28 Oct 2021 09:57

Repository Staff Only: item control page