Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorMahendra, Rahmad
dc.contributor.authorAji, Alham Fikri
dc.contributor.authorLouvan, Samuel
dc.contributor.authorRahman, Fahrurrozi
dc.contributor.authorVania, Clara
dc.date.accessioned2024-04-04T15:30:04Z
dc.date.available2024-04-04T15:30:04Z
dc.date.issued2021-11-07
dc.identifier294047985
dc.identifierea54067c-9d75-4123-a77b-60c2fc166eff
dc.identifier.citationMahendra , R , Aji , A F , Louvan , S , Rahman , F & Vania , C 2021 , IndoNLI : a Natural Language Inference Dataset for Indonesian . in IndoNLI : A Natural Language Inference Dataset for Indonesian . Association for Computational Linguistics , pp. 10511–10527 . https://doi.org/10.18653/v1/2021.emnlp-main.821en
dc.identifier.isbn9781955917094
dc.identifier.urihttps://hdl.handle.net/10023/29606
dc.description.abstractWe present IndoNLI, the first human-elicited NLI dataset for Indonesian. We adapt the data collection protocol for MNLI and collect ~18K sentence pairs annotated by crowd workers and experts. The expert-annotated data is used exclusively as a test set. It is designed to provide a challenging test-bed for Indonesian NLI by explicitly incorporating various linguistic phenomena such as numerical reasoning, structural changes, idioms, or temporal and spatial reasoning. Experiment results show that XLM-R outperforms other pre-trained models in our data. The best performance on the expert-annotated data is still far below human performance (13.4% accuracy gap), suggesting that this test set is especially challenging. Furthermore, our analysis shows that our expert-annotated data is more diverse and contains fewer annotation artifacts than the crowd-annotated data. We hope this dataset can help accelerate progress in Indonesian NLP research.
dc.format.extent17
dc.format.extent375650
dc.language.isoeng
dc.publisherAssociation for Computational Linguistics
dc.relation.ispartofIndoNLIen
dc.subjectQA75 Electronic computers. Computer scienceen
dc.subjectNSen
dc.subject.lccQA75en
dc.titleIndoNLI : a Natural Language Inference Dataset for Indonesianen
dc.typeConference itemen
dc.contributor.institutionUniversity of St Andrews. School of Computer Scienceen
dc.identifier.doi10.18653/v1/2021.emnlp-main.821


This item appears in the following Collection(s)

Show simple item record