Automatic classification of human translation and machine translation : a study from the perspective of lexical diversity

Fu, Yingxue; Nederhof, Mark Jan

View/Open

Fu_2021_Automatic_classification_of_human_MOTRA_91_CCBY.pdf (135.7Kb)

Date

31/05/2021

Abstract

By using a trigram model and fine-tuning a pretrained BERT model for sequence classification, we show that machine translation and human translation can be classified with an accuracy above chance level, which suggests that machine translation and human translation are different in a systematic way. The classification accuracy of machine translation is much higher than of human translation. We show that this may be explained by the difference in lexical diversity between machine translation and human translation. If machine translation has independent patterns from human translation, automatic metrics which measure the deviation of machine translation from human translation may conflate difference with quality. Our experiment with two different types of automatic metrics shows correlation with the result of the classification task. Therefore, we suggest the difference in lexical diversity between machine translation and human translation be given more attention in machine translation evaluation.

Citation

Fu , Y & Nederhof , M J 2021 , Automatic classification of human translation and machine translation : a study from the perspective of lexical diversity . in Y Bizzoni , E Teich , C España-Bonet & J van Genabith (eds) , Proceedings for the First Workshop on Modelling Translation : Translatology in the Digital Age . NEALT Proceedings Series , Linkoping University Electronic Press , pp. 91–99 , Workshop on Modelling Translation , Online City , Iceland , 31/05/21 . < https://aclanthology.org/previews/ingest-nodalida/2021.motra-1.10/ >

workshop

Publication

Proceedings for the First Workshop on Modelling Translation

ISSN

1650-3686

Type

Conference item

Collections

University of St Andrews Research

URL

https://aclanthology.org/previews/ingest-nodalida/2021.motra-1.10/

URI

https://hdl.handle.net/10023/23304