St Andrews Research Repository

St Andrews University Home
View Item 
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

frances: a deep learning NLP and text mining web tool to unlock historical digital collections : a case study on the Encyclopaedia Britannica

Thumbnail
View/Open
Filgueira_2022_eScience_frances_AAM.pdf (15.24Mb)
Date
14/12/2022
Author
Filgueira, Rosa
Keywords
Information extraction
Knowlege graph
Transfer learning
Natural language processing
Text mining
Web tools
Semantic web
Parallel computing
Digital tools
Digital textual collections
Deep learning
Metadata
Knowledge engineering
Information retrieval
QA75 Electronic computers. Computer science
Z665 Library Science. Information Science
Artificial Intelligence
Computer Science Applications
Information Systems
T-NDAS
MCC
NIS
Metadata
Show full item record
Altmetrics Handle Statistics
Altmetrics DOI Statistics
Abstract
This work presents frances, an integrated text mining tool that combines information extraction, knowledge graphs, NLP, deep learning, parallel processing and Semantic Web techniques to unlock the full value of historical digital textual collections, offering new capabilities for researchers to use powerful analysis methods without being distracted by the technology and middleware details. To demonstrate these capabilities, we use the first eight editions of the Encyclopaedia Britannica offered by the National Library of Scotland (NLS) as an example digital collection to mine and analyse. We have developed novel parallel heuristics to extract terms from the original collection (alongside metadata), which provides a mix of unstructured and semi-structured input data, and populated a new knowledge graph with this information. Our Natural Language Processing models enable frances to perform advanced analyses that go significantly beyond simple search using the information stored in the knowledge graph. Furthermore, frances also allows for creating and running complex text mining analyses at scale. Our results show that the novel computational techniques developed within frances provide a vehicle for researchers to formalize and connect findings and insights derived from the analysis of large-scale digital corpora such as the Encyclopaedia Britannica.
Citation
Filgueira , R 2022 , frances : a deep learning NLP and text mining web tool to unlock historical digital collections : a case study on the Encyclopaedia Britannica . in 2022 IEEE 18th International Conference on e-Science (e-Science) . , 9973695 , IEEE international conference on e-science and grid computing , IEEE , pp. 246-255 , 18th IEEE International eScience Conference (eScience 2022) , Salt Lake City , Utah , United States , 10/10/22 . https://doi.org/10.1109/eScience55777.2022.00038
 
conference
 
Publication
2022 IEEE 18th International Conference on e-Science (e-Science)
DOI
https://doi.org/10.1109/eScience55777.2022.00038
Type
Conference item
Rights
Copyright © 2022 IEEE. This work has been made available online in accordance with publisher policies or with permission. Permission for further reuse of this content should be sought from the publisher or the rights holder. This is the author created accepted manuscript following peer review and may differ slightly from the final published version. The final published version of this work is available at https://doi.org/10.1109/eScience55777.2022.00038.
Description
Funding: This work was supported by the NLS Digital Fellowship and by the Google Cloud Platform research credit program.
Collections
  • University of St Andrews Research
URL
https://ieeexplore.ieee.org/xpl/conhome/9973400/proceeding
https://www.escience-conference.org/2022/
URI
http://hdl.handle.net/10023/27006

Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

Advanced Search

Browse

All of RepositoryCommunities & CollectionsBy Issue DateNamesTitlesSubjectsClassificationTypeFunderThis CollectionBy Issue DateNamesTitlesSubjectsClassificationTypeFunder

My Account

Login

Open Access

To find out how you can benefit from open access to research, see our library web pages and Open Access blog. For open access help contact: openaccess@st-andrews.ac.uk.

Accessibility

Read our Accessibility statement.

How to submit research papers

The full text of research papers can be submitted to the repository via Pure, the University's research information system. For help see our guide: How to deposit in Pure.

Electronic thesis deposit

Help with deposit.

Repository help

For repository help contact: Digital-Repository@st-andrews.ac.uk.

Give Feedback

Cookie policy

This site may use cookies. Please see Terms and Conditions.

Usage statistics

COUNTER-compliant statistics on downloads from the repository are available from the IRUS-UK Service. Contact us for information.

© University of St Andrews Library

University of St Andrews is a charity registered in Scotland, No SC013532.

  • Facebook
  • Twitter