edoc

PyBDA: a command line tool for automated analysis of big biological data sets

Dirmeier, Simon and Emmenlauer, Mario and Dehio, Christoph and Beerenwinkel, Niko. (2019) PyBDA: a command line tool for automated analysis of big biological data sets. BMC bioinformatics, 20. p. 564.

[img] PDF - Published Version
Available under License CC BY (Attribution).

1178Kb

Official URL: https://edoc.unibas.ch/74629/

Downloads: Statistics Overview

Abstract

Analysing large and high-dimensional biological data sets poses significant computational difficulties for bioinformaticians due to lack of accessible tools that scale to hundreds of millions of data points. We developed a novel machine learning command line tool called PyBDA for automated, distributed analysis of big biological data sets. By using Apache Spark in the backend, PyBDA scales to data sets beyond the size of current applications. It uses Snakemake in order to automatically schedule jobs to a high-performance computing cluster. We demonstrate the utility of the software by analyzing image-based RNA interference data of 150 million single cells. PyBDA allows automated, easy-to-use data analysis using common statistical methods and machine learning algorithms. It can be used with simple command line calls entirely making it accessible to a broad user base. PyBDA is available at https://pybda.rtfd.io.
Faculties and Departments:05 Faculty of Science > Departement Biozentrum > Infection Biology > Molecular Microbiology (Dehio)
UniBasel Contributors:Dehio, Christoph
Item Type:Article, refereed
Article Subtype:Research Article
Publisher:BioMed Central
e-ISSN:1471-2105
Note:Publication type according to Uni Basel Research Database: Journal article
Language:English
Related URLs:
Identification Number:
edoc DOI:
Last Modified:22 Jan 2020 11:10
Deposited On:21 Jan 2020 08:05

Repository Staff Only: item control page