Show simple item record

Files in this item


Item metadata

dc.contributor.authorKristien, Martin
dc.contributor.authorSpink, Tom
dc.contributor.authorCampbell, Brian
dc.contributor.authorSarkar, Susmit
dc.contributor.authorStark, Ian
dc.contributor.authorFranke, Björn
dc.contributor.authorBöhm, Igor
dc.contributor.authorTopham, Nigel
dc.identifier.citationKristien , M , Spink , T , Campbell , B , Sarkar , S , Stark , I , Franke , B , Böhm , I & Topham , N 2020 , Fast and correct load-link/store-conditional instruction handling in DBT systems . in CASES '20: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems . vol. Early Access , IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , IEEE Computer Society , International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '20) , 20/09/20 .
dc.identifier.otherPURE: 270898567
dc.identifier.otherPURE UUID: 112b05cb-3b3f-4414-8ca3-7fdc6602340b
dc.identifier.otherScopus: 85096033690
dc.identifier.otherWOS: 000587712700033
dc.identifier.otherORCID: /0000-0002-7662-3146/work/103138170
dc.identifier.otherORCID: /0000-0002-4259-9213/work/125727570
dc.description.abstractDynamic Binary Translation (DBT) requires the implementation of load-link/store-conditional (LL/SC) primitives for guest systems that rely on this form of synchronization. When targeting e.g. x86 host systems, LL/SC guest instructions are typically emulated using atomic Compare-and-Swap (CAS) instructions on the host. Whilst this direct mapping is efficient, this approach is problematic due to subtle differences between LL/SC and CAS semantics. In this paper, we demonstrate that this is a real problem, and we provide code examples that fail to execute correctly on QEMU and a commercial DBT system, which both use the CAS approach to LL/SC emulation. We then develop two novel and provably correct LL/SC emulation schemes: (1) A purely software based scheme, which uses the DBT system’s page translation cache for correctly selecting between fast, but unsynchronized, and slow, but fully synchronized memory accesses, and (2) a hardware accelerated scheme that leverages hardware transactional memory (HTM) provided by the host. We have implemented these two schemes in the Synopsys DesignWare® ARC® nSIM DBT system, and we evaluate our implementations against full applications, and targeted micro-benchmarks. We demonstrate that our novel schemes are not only correct, but also deliver competitive performance on-par or better than the widely used, but broken CAS scheme.
dc.publisherIEEE Computer Society
dc.relation.ispartofCASES '20: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systemsen
dc.relation.ispartofseriesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systemsen
dc.rightsCopyright © 2020 IEEE. This work has been made available online in accordance with publisher policies or with permission. Permission for further reuse of this content should be sought from the publisher or the rights holder. This is the author created accepted manuscript following peer review and may differ slightly from the final published version. The final published version of this work is available at
dc.subjectQA75 Electronic computers. Computer scienceen
dc.titleFast and correct load-link/store-conditional instruction handling in DBT systemsen
dc.typeConference itemen
dc.contributor.institutionUniversity of St Andrews. School of Computer Scienceen

This item appears in the following Collection(s)

Show simple item record