Borg : the next generation
Abstract
This paper analyzes a newly-published trace that covers 8 different Borg [35] clusters for the month of May 2019. The trace enables researchers to explore how scheduling works in large-scale production compute clusters. We highlight how Borg has evolved and perform a longitudinal comparison of the newly-published 2019 trace against the 2011 trace, which has been highly cited within the research community. Our findings show that Borg features such as alloc sets are used for resource-heavy workloads; automatic vertical scaling is effective; job-dependencies account for much of the high failure rates reported by prior studies; the workload arrival rate has increased, as has the use of resource over-commitment; the workload mix has changed, jobs have migrated from the free tier into the best-effort batch tier; the workload exhibits an extremely heavy-tailed distribution where the top 1% of jobs consume over 99% of resources; and there is a great deal of variation between different clusters.
Citation
Tirmazi , M , Barker , A , Deng , N , Haque , M E , Qin , Z G , Hand , S , Harchol-Balter , M & Wilkes , J 2020 , Borg : the next generation . in Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys '20) . , 30 , ACM , New York , pp. 1-14 , Fifteenth European Conference on Computer Systems (EuroSys ’20) , Heraklion , Greece , 27/04/20 . https://doi.org/10.1145/3342195.3387517 conference
Publication
Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys '20)
Type
Conference item
Collections
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.