Repository logo
Log In
  1. Home
  2. Unibas
  3. Publications
  4. PyBDA: a command line tool for automated analysis of big biological data sets
 
  • Details

PyBDA: a command line tool for automated analysis of big biological data sets

Date Issued
2019-01-01
Author(s)
Dirmeier, Simon
Emmenlauer, Mario  
Dehio, Christoph  
Beerenwinkel, Niko
DOI
10.1186/s12859-019-3087-8
Abstract
Analysing large and high-dimensional biological data sets poses significant computational difficulties for bioinformaticians due to lack of accessible tools that scale to hundreds of millions of data points. We developed a novel machine learning command line tool called PyBDA for automated, distributed analysis of big biological data sets. By using Apache Spark in the backend, PyBDA scales to data sets beyond the size of current applications. It uses Snakemake in order to automatically schedule jobs to a high-performance computing cluster. We demonstrate the utility of the software by analyzing image-based RNA interference data of 150 million single cells. PyBDA allows automated, easy-to-use data analysis using common statistical methods and machine learning algorithms. It can be used with simple command line calls entirely making it accessible to a broad user base. PyBDA is available at https://pybda.rtfd.io.
File(s)
Loading...
Thumbnail Image
Name

s12859-019-3087-8

Size

1.15 MB

Format

Unknown

Checksum

(MD5):efb6a46ab23208e26a7560086f087b48

University of Basel

edoc
Open Access Repository University of Basel

  • About edoc
  • About Open Access at the University of Basel
  • edoc Policy

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement