Bioinformatics Pipeline Enables Analysis of Massive Genetics Datasets

Scalable cloud-based solution enables data scientists to analyze vast amounts of variant genetic data while ensuring accuracy, usability and speed.

The Challenge

A company that integrates genomics into a comprehensive lifestyle plan using genomics sequencing for diagnosing genetic disorders required a new software tool designed for variant discovery and interpretation in a clinical laboratory environment. The software would allow clinical scientists to process, analyze, interpret and report on personal genome files. 

The Approach

The NewPage team knew that for the client to effectively use the new solution as a data tool to determine genetic causes of disease, Sequence Alignment tools would be needed to handle the large data sets containing thousands to millions of detected variants from the reference sequence. We had to determine how to convert FASTQ files containing gene sequencing data—gigabytes in size—to VCF files. Our solution approach involved designing a Genome Sequencing Pipeline, where genomic data is uploaded and accepts Paired and FASTQ files, and generates a VCF file with Variants and Alleles. The conversion process takes approximately 5 minutes. 

The Results

NewPage developed a Genome Sequencing Pipeline, which the client uses for its sequencing services to quickly and easily annotate genomes, analyze variant data in order to identify the genetic causes of disease, and generate customized reports for individual patients. The intuitive design of the solution allows the end user to analyze large variant data sets directly through annotation, multiple sort and filter selections, intersect and difference functions. The solution provides accuracy, usability and speed (5 minutes from FASTQ to PDF) – all critical in the application of genetic interpretation. The client chose to partner with NewPage because of our demonstrated experience working with healthcare organizations to develop solutions that enable interpretation of mass quantities of data by leveraging the latest emerging technologies. 

