St Andrews Research Repository

St Andrews University Home
View Item 
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

Automating fault tolerance in high-performance computational biological jobs using multi-agent approaches

Thumbnail
View/Open
revision.pdf (246.3Kb)
Date
01/05/2014
Author
Varghese, Blesson
McKee, Gerard
Alexandrov, Vassil
Keywords
High-performance computing
Fault tolerance
Biological jobs
Multi-agents
Seamless execution
Checkpoint
QA75 Electronic computers. Computer science
Metadata
Show full item record
Altmetrics Handle Statistics
Altmetrics DOI Statistics
Abstract
Background: Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively detect computing core failures and take action to relocate the computing core׳s job onto reliable cores can make a significant step towards automating fault tolerance. Method: This paper describes an experimental investigation into the use of multi-agent approaches for fault tolerance. Two approaches are studied, the first at the job level and the second at the core level. The approaches are investigated for single core failure scenarios that can occur in the execution of parallel reduction algorithms on computer clusters. A third approach is proposed that incorporates multi-agent technology both at the job and core level. Experiments are pursued in the context of genome searching, a popular computational biology application. Result: The key conclusion is that the approaches proposed are feasible for automating fault tolerance in high-performance computing systems with minimal human intervention. In a typical experiment in which the fault tolerance is studied, centralised and decentralised checkpointing approaches on an average add 90% to the actual time for executing the job. On the other hand, in the same experiment the multi-agent approaches add only 10% to the overall execution time.
Citation
Varghese , B , McKee , G & Alexandrov , V 2014 , ' Automating fault tolerance in high-performance computational biological jobs using multi-agent approaches ' , Computers in Biology and Medicine , vol. 48 , pp. 28-41 . https://doi.org/10.1016/j.compbiomed.2014.02.005
Publication
Computers in Biology and Medicine
Status
Peer reviewed
DOI
https://doi.org/10.1016/j.compbiomed.2014.02.005
Type
Journal article
Rights
© 2014. Elsevier Ltd. All rights reserved. this is the author’s version of a work that was accepted for publication in Computers in Biology and Medicine. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computers in Biology and Medicine, 48, 1 May 2014, 28-41 DOI 10.1016/j.compbiomed.2014.02.005
Collections
  • University of St Andrews Research
URI
http://hdl.handle.net/10023/6252

Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

Advanced Search

Browse

All of RepositoryCommunities & CollectionsBy Issue DateNamesTitlesSubjectsClassificationTypeFunderThis CollectionBy Issue DateNamesTitlesSubjectsClassificationTypeFunder

My Account

Login

Open Access

To find out how you can benefit from open access to research, see our library web pages and Open Access blog. For open access help contact: openaccess@st-andrews.ac.uk.

Accessibility

Read our Accessibility statement.

How to submit research papers

The full text of research papers can be submitted to the repository via Pure, the University's research information system. For help see our guide: How to deposit in Pure.

Electronic thesis deposit

Help with deposit.

Repository help

For repository help contact: Digital-Repository@st-andrews.ac.uk.

Give Feedback

Cookie policy

This site may use cookies. Please see Terms and Conditions.

Usage statistics

COUNTER-compliant statistics on downloads from the repository are available from the IRUS-UK Service. Contact us for information.

© University of St Andrews Library

University of St Andrews is a charity registered in Scotland, No SC013532.

  • Facebook
  • Twitter