Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorlI, Zihao
dc.contributor.authorFilgueira, Rosa
dc.contributor.editorPapadopoulos, George Angelos
dc.contributor.editorFilgueira, Rosa
dc.contributor.editorDa Silva, Rafael Ferreira
dc.date.accessioned2023-11-10T12:30:17Z
dc.date.available2023-11-10T12:30:17Z
dc.date.issued2023-09-25
dc.identifier290675757
dc.identifierebf12b01-0183-41d0-8d5d-d3432d5739e5
dc.identifier85174249454
dc.identifier.citationlI , Z & Filgueira , R 2023 , Mapping the repository landscape : harnessing similarity with RepoSim and RepoSnipy . in G A Papadopoulos , R Filgueira & R F Da Silva (eds) , Proceedings : 2023 IEEE 19th international conference on e-science (e-science) . , 10254873 , IEEE international conference on e-science , IEEE , Piscataway, NJ , 19th IEEE International Conference on eScience , Limassol , Cyprus , 9/10/23 . https://doi.org/10.1109/e-Science58273.2023.10254873en
dc.identifier.citationconferenceen
dc.identifier.isbn9798350322248
dc.identifier.isbn9798350322231
dc.identifier.issn2325-372X
dc.identifier.urihttps://hdl.handle.net/10023/28673
dc.description.abstractThe rapid growth of scientific software development has led to the emergence of large and complex codebases, making it challenging to search, find, and compare software repositories within the scientific research community. In this paper, we propose a solution by leveraging deep learning techniques to learn embeddings that capture semantic similarities among repositories. Our approach focuses on identifying repositories with similar semantics, even when their code fragments and documentation exhibit different syntax. To address this challenge, we introduce two complementary open-source tools: RepoSim and RepoSnipy. RepoSim is a command-line toolbox designed to represent repositories at both the source code and documentation levels. It utilizes the UniXcoder pre-trained language model, which has demonstrated remarkable performance in code-related understanding tasks. RepoSnipy is a web-based neural semantic search engine that utilizes the powerful capabilities of RepoSim and offers a user-friendly search interface, allowing researchers and practitioners to query public repositories hosted on GitHub and discover semantically similar repositories. RepoSim and RepoSnipy empower researchers, developers, and practitioners by facilitating the comparison and analysis of software repositories. They not only enable efficient collaboration and code reuse but also accelerate the development of scientific software.
dc.format.extent10
dc.format.extent2708056
dc.language.isoeng
dc.publisherIEEE
dc.relation.ispartofProceedingsen
dc.relation.ispartofseriesIEEE international conference on e-scienceen
dc.subjectSemantic similarityen
dc.subjectCode searchen
dc.subjectCode understandingen
dc.subjectEmbeddings, pre-trained language modelsen
dc.subjectGitHuben
dc.subjectQA75 Electronic computers. Computer scienceen
dc.subjectQA76 Computer softwareen
dc.subject3rd-DASen
dc.subjectMCCen
dc.subject.lccQA75en
dc.subject.lccQA76en
dc.titleMapping the repository landscape : harnessing similarity with RepoSim and RepoSnipyen
dc.typeConference itemen
dc.contributor.institutionUniversity of St Andrews. School of Computer Scienceen
dc.identifier.doihttps://doi.org/10.1109/e-Science58273.2023.10254873
dc.date.embargoedUntil2023-09-25
dc.identifier.urlhttps://doi.org/10.1109/e-Science58273.2023en


This item appears in the following Collection(s)

Show simple item record