Web observations: analysing Web data through automated data extraction

Gröflin, Alexander Olivier. Web observations: analysing Web data through automated data extraction. 2018, Doctoral Thesis, University of Basel, Faculty of Science.

Available under License CC BY-NC-ND (Attribution-NonCommercial-NoDerivatives).


Official URL: http://edoc.unibas.ch/diss/DissB_13527

Downloads: Statistics Overview


In this thesis, a generic architecture for Web observations is introduced. Beginning with fundamental data aspects and technologies for building Web observations, requirements and architectural designs are outlined. Because Web observations are basic tools to collect information from any Web resource, legal perspectives are discussed in order to give an understanding of recent regulations, e.g. General Data Protection Regulation (GDPR). The general idea of Web observatories, its concepts, and experiments are presented to identify the best solution for Web data collections and based thereon, visualisation from any kind of Web resource. With the help of several Web observation scenarios, data sets were collected, analysed and eventually published in a machine-readable or visual form for users to be interpreted. The main research goal was to create a Web observation based on an architecture that is able to collect information from any given Web resource to make sense of a broad amount of yet untapped information sources. To find this generally applicable architectural structure, several research projects with different designs have been conducted. Eventually, the container based building block architecture emerged from these initial designs as the most flexible architectural structure. Thanks to these considerations and architectural designs, a flexible and easily adaptable architecture was created that is able to collect data from all kinds of Web resources. Thanks to such broad Web data collections, users can get a more comprehensible understanding and insight of real-life problems, the efficiency and profitability of services as well as gaining valuable information on the changes of a Web resource.
Advisors:Burkhart, Helmar and Bacon, Liz
Faculties and Departments:05 Faculty of Science > Departement Mathematik und Informatik > Ehemalige Einheiten Mathematik & Informatik > High Performance and Web Computing (Burkhart)
UniBasel Contributors:Burkhart, Helmar
Item Type:Thesis
Thesis Subtype:Doctoral Thesis
Thesis no:13527
Thesis status:Complete
Number of Pages:1 Online-Ressource (xii, 200 Seiten)
Identification Number:
edoc DOI:
Last Modified:25 Feb 2020 05:30
Deposited On:24 Feb 2020 15:18

Repository Staff Only: item control page