Boost 1.44 download ubuntu 14.04
In order to meet the challenges, we considered distributed storage, distributed computing framework and distributed algorithms. It is critical and difficult for subsequent analysis to map these reads to the reference genome quickly and accurately. A sample can typically produce billions of reads. The lengths of the read are generally range from several to thousands of bases. Read mapping is the first and time-consuming step in the whole genetic data analysis pipeline. The second challenge is the limited scalability of traditional read mapping algorithms. It is difficult to design and implement such framework. The genetic data analysis usually involves a large amount of data, varied data formats and complicate analysis process. The first one is to design and implement a distributed genetic data analysis pipeline framework. There are two main challenges in implementing GCDSS and improving its performance. In order to solve the problems mentioned above, we propose GCDSS, a distributed gene clinical decision support system based on cloud computing technology. However, the traditional genetic data storage and analysis technology based on stand-alone environment are hard to meet the computational requirements with the rapid data growth for the limited scalability. Therefore, it is significant to accelerate the processing of genetic data for CDSS. What’s more, time is equal to life in the medical field, especially in emergency. It need more time to compute when the sequencing depth is deeper or the length of reads is longer. The current best practice genomic variant calling pipeline is that use the Burrows-Wheeler Alignment tool (BWA) to map genetic sequencing data to a reference and use the Genome Analysis Toolkit (GATK) to produce high-quality variant calls, which takes approximately 120 h to process a single, high-quality human genome using a single, beefy node. Therefore, faster genetic data storage and analysis technologies are urgently needed. How to store and analyze the large amount of genetic data has become a huge challenge. With the development of next-generation sequencing (NGS) technology, the number of newly sequenced data increase exponentially in recent years. Genetic diagnosis have the advantages of early detection, early discovery, early prevention and early treatment. CDSS can effectively break the limitations of doctors’ knowledge and reduce the possibility of misdiagnosis to guarantee the quality of medical care with a lower medical expenses. To boost the data processing of GCDSS, we propose CloudBWA, which is a novel distributed read mapping algorithm to leverage batch processing technique in mapping stage using Apache Spark platform.Ĭlinical decision support system (CDSS) provides clinicians, staff, patients, and other individuals with knowledge and person-specific information to enhance health and health care. In particular, we incorporated a distributed genetic data analysis pipeline framework in the proposed GCDSS system. GCDSS is a distributed gene clinical decision support system based on cloud computing techniques. Compared with stand-alone algorithms, CloudBWA with 16 cores achieves up to 11.59 times speedup over BWA-MEM with 1 core. Compared with state-of-the-art distributed algorithms, CloudBWA achieves up to 2.63 times speedup over SparkBWA. ResultsĮxperiments show that the distributed gene clinical decision support system GCDSS and the distributed read mapping algorithm CloudBWA have outstanding performance and excellent scalability. At the same time, we present CloudBWA which is a novel distributed read mapping algorithm leveraging batch processing strategy to map reads on Apache Spark. And a prototype is implemented based on cloud computing technology. In this paper, we propose a distributed gene clinical decision support system, which is named GCDSS. The traditional genetic data storage and analysis methods based on stand-alone environment are hard to meet the computational requirements with the rapid genetic data growth for the limited scalability. The clinical decision support system can effectively break the limitations of doctors’ knowledge and reduce the possibility of misdiagnosis to enhance health care.