A grey-box approach to benchmarking and performance modelling of data-intensive applications

Ceesay, Sheriffo

Show simple item record

Files in this item

Name:: SheriffoCeesayPhDThesis.pdf
Size:: 2.716Mb
Format:: PDF

View/Open

Item metadata

dc.contributor.advisor	Barker, Adam David
dc.contributor.author	Ceesay, Sheriffo
dc.coverage.spatial	xiii, 175 p.	en_US
dc.date.accessioned	2021-04-14T12:25:31Z
dc.date.available	2021-04-14T12:25:31Z
dc.date.issued	2021-06-30
dc.identifier.uri	https://hdl.handle.net/10023/23026
dc.description.abstract	The advent of big data about a decade ago, coupled with its processing and storage challenges gave rise to the development of a multitude of data-intensive frameworks. These distributed parallel processing frameworks can be used to process petabytes of data stored in a cluster of computing nodes. Companies and organisations can now process massive amounts of data to drive innovation and gain a competitive advantage. However, these new paradigms have resulted in several research challenges due to their inherent difference from the more mature traditional data processing and storage systems. Firstly, they are comparatively more modern, supporting the execution of a wide variety of new data-intensive workloads with varying performance requirements. Therefore, there is a clear need to study and standardise ways to benchmark and compare them to identify and improve performance bottlenecks. Secondly, they are highly configurable; enabling users the freedom to tune the execution environment based on the application's performance requirements. However, this freedom and the ubiquity of the configuration parameters present an additional challenge by shifting the tuning and optimisation responsibilities of these numerous configuration parameters to the users. To address the above broad challenges, in this research, we enabled a grey-box benchmarking and performance modelling framework focusing on two of the most common communication patterns for data-intensive applications. The use of communication patterns allowed us to classify and study varying but related data-intensive workloads using the same sets of requirements. Furthermore, we enabled a multi-objective performance prediction framework that can be used to answer various performance-related questions such as the time it takes to execute an application, the best configuration parameters to satisfy constraints such as deadline, and recommendation of optimal cloud instances to minimise monetary cost. To gauge the generality of this work, we have validated the results on two internal clusters, and the results are consistent across both setups. We have also provided a REST API and web implementation for validation. The primary take way result is that the research showcase a comprehensive approach that can be used to benchmarking and modelling the performance of data-intensive applications.	en_US
dc.language.iso	en	en_US
dc.publisher	University of St Andrews
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Benchmarking	en_US
dc.subject	Performance modelling	en_US
dc.subject	MapReduce	en_US
dc.subject	Big data	en_US
dc.subject	Data-intensive	en_US
dc.subject	Communication patterns	en_US
dc.title	A grey-box approach to benchmarking and performance modelling of data-intensive applications	en_US
dc.type	Thesis	en_US
dc.contributor.sponsor	Islamic Development Bank	en_US
dc.contributor.sponsor	University of St Andrews. School of Computer Science	en_US
dc.type.qualificationlevel	Doctoral	en_US
dc.type.qualificationname	PhD Doctor of Philosophy	en_US
dc.publisher.institution	The University of St Andrews	en_US
dc.identifier.doi	https://doi.org/10.17630/sta/60

The following licence files are associated with this item:

This item appears in the following Collection(s)

Computer Science Theses

Show simple item record

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

Except where otherwise noted within the work, this item's licence for re-use is described as Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International