Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.advisorBarker, Adam David
dc.contributor.authorCeesay, Sheriffo
dc.coverage.spatialxiii, 175 p.en_US
dc.date.accessioned2021-04-14T12:25:31Z
dc.date.available2021-04-14T12:25:31Z
dc.date.issued2021-06-30
dc.identifier.urihttps://hdl.handle.net/10023/23026
dc.description.abstractThe advent of big data about a decade ago, coupled with its processing and storage challenges gave rise to the development of a multitude of data-intensive frameworks. These distributed parallel processing frameworks can be used to process petabytes of data stored in a cluster of computing nodes. Companies and organisations can now process massive amounts of data to drive innovation and gain a competitive advantage. However, these new paradigms have resulted in several research challenges due to their inherent difference from the more mature traditional data processing and storage systems. Firstly, they are comparatively more modern, supporting the execution of a wide variety of new data-intensive workloads with varying performance requirements. Therefore, there is a clear need to study and standardise ways to benchmark and compare them to identify and improve performance bottlenecks. Secondly, they are highly configurable; enabling users the freedom to tune the execution environment based on the application's performance requirements. However, this freedom and the ubiquity of the configuration parameters present an additional challenge by shifting the tuning and optimisation responsibilities of these numerous configuration parameters to the users. To address the above broad challenges, in this research, we enabled a grey-box benchmarking and performance modelling framework focusing on two of the most common communication patterns for data-intensive applications. The use of communication patterns allowed us to classify and study varying but related data-intensive workloads using the same sets of requirements. Furthermore, we enabled a multi-objective performance prediction framework that can be used to answer various performance-related questions such as the time it takes to execute an application, the best configuration parameters to satisfy constraints such as deadline, and recommendation of optimal cloud instances to minimise monetary cost. To gauge the generality of this work, we have validated the results on two internal clusters, and the results are consistent across both setups. We have also provided a REST API and web implementation for validation. The primary take way result is that the research showcase a comprehensive approach that can be used to benchmarking and modelling the performance of data-intensive applications.en_US
dc.language.isoenen_US
dc.publisherUniversity of St Andrews
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectBenchmarkingen_US
dc.subjectPerformance modellingen_US
dc.subjectMapReduceen_US
dc.subjectBig dataen_US
dc.subjectData-intensiveen_US
dc.subjectCommunication patternsen_US
dc.titleA grey-box approach to benchmarking and performance modelling of data-intensive applicationsen_US
dc.typeThesisen_US
dc.contributor.sponsorIslamic Development Banken_US
dc.contributor.sponsorUniversity of St Andrews. School of Computer Scienceen_US
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhD Doctor of Philosophyen_US
dc.publisher.institutionThe University of St Andrewsen_US
dc.identifier.doihttps://doi.org/10.17630/sta/60


The following licence files are associated with this item:

    This item appears in the following Collection(s)

    Show simple item record

    Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
    Except where otherwise noted within the work, this item's licence for re-use is described as Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International