St Andrews Research Repository

St Andrews University Home
View Item 
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

SparkFlow : towards high-performance data analytics for Spark-based genome analysis

Thumbnail
View/Open
Filgueira_2022_CCGrid_SparkFlow_AAM.pdf (753.2Kb)
Date
19/07/2022
Author
Filgueira, Rosa
Awaysheh, Feras M.
Carter, Adam
White, Darren J.
Rana, Omar
Keywords
Big data
Scientific workflow
HPC
Genome analysis
Apache Spark
High-performance data analytics
QA75 Electronic computers. Computer science
QH301 Biology
QH426 Genetics
NS
NIS
Metadata
Show full item record
Altmetrics Handle Statistics
Altmetrics DOI Statistics
Abstract
The recent advances in DNA sequencing technology triggered next-generation sequencing (NGS) research in full scale. Big Data (BD) is becoming the main driver in analyzing these large-scale bioinformatic data. However, this complicated process has become the system bottleneck, requiring an amalgamation of scalable approaches to deliver the needed performance and hide the deployment complexity. Utilizing cutting-edge scientific workflows can robustly address these challenges. This paper presents a Spark-based alignment workflow called SparkFlow for massive NGS analysis over singularity containers. SparkFlow is highly scalable, reproducible, and capable of parallelizing computation by utilizing data-level parallelism and load balancing techniques in HPC and Cloud environments. The proposed workflow capitalizes on benchmarking two state-of-art NGS workflows, i.e., BaseRecalibrator and ApplyBQSR. SparkFlow realizes the ability to accelerate large-scale cancer genomic analysis by scaling vertically (HyperThreading) and horizontally (provisions on-demand). Our result demonstrates a trade-off inevitably between the targeted applications and processor architecture. SparkFlow achieves a decisive improvement in NGS computation performance, throughput, and scalability while maintaining deployment complexity. The paper’s findings aim to pave the way for a wide range of revolutionary enhancements and future trends within the High-performance Data Analytics (HPDA) genome analysis realm.
Citation
Filgueira , R , Awaysheh , F M , Carter , A , White , D J & Rana , O 2022 , SparkFlow : towards high-performance data analytics for Spark-based genome analysis . in 20252 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid) . IEEE , pp. 1007-1016 , Workshop on Clusters, Clouds and Grids for Life Sciences (CCGrid Life 2022) , Taormina , Italy , 16/05/22 . https://doi.org/10.1109/CCGrid54584.2022.00123
 
workshop
 
Publication
20252 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)
DOI
https://doi.org/10.1109/CCGrid54584.2022.00123
Type
Conference item
Rights
Copyright © 2022 IEEE. This work has been made available online in accordance with publisher policies or with permission. Permission for further reuse of this content should be sought from the publisher or the rights holder. This is the author created accepted manuscript following peer review and may differ slightly from the final published version. The final published version of this work is available at https://doi.org/10.1109/CCGrid54584.2022.00123
Collections
  • University of St Andrews Research
URI
http://hdl.handle.net/10023/25076

Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

Advanced Search

Browse

All of RepositoryCommunities & CollectionsBy Issue DateNamesTitlesSubjectsClassificationTypeFunderThis CollectionBy Issue DateNamesTitlesSubjectsClassificationTypeFunder

My Account

Login

Open Access

To find out how you can benefit from open access to research, see our library web pages and Open Access blog. For open access help contact: openaccess@st-andrews.ac.uk.

Accessibility

Read our Accessibility statement.

How to submit research papers

The full text of research papers can be submitted to the repository via Pure, the University's research information system. For help see our guide: How to deposit in Pure.

Electronic thesis deposit

Help with deposit.

Repository help

For repository help contact: Digital-Repository@st-andrews.ac.uk.

Give Feedback

Cookie policy

This site may use cookies. Please see Terms and Conditions.

Usage statistics

COUNTER-compliant statistics on downloads from the repository are available from the IRUS-UK Service. Contact us for information.

© University of St Andrews Library

University of St Andrews is a charity registered in Scotland, No SC013532.

  • Facebook
  • Twitter