Chapter published in:Multiword Units in Machine Translation and Translation Technology
Edited by Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor and Violeta Seretan
[Current Issues in Linguistic Theory 341] 2018
► pp. 42–59
Analysing linguistic information about word combinations for a Spanish-Basque rule-based machine translation system
Uxoa Iñurrieta | IXA NLP group, University of the Basque Country
Itziar Aduriz | Department of Linguistics, University of Barcelona
Arantza Díaz Ilarraza | IXA NLP group, University of the Basque Country
Gorka Labaka | IXA NLP group, University of the Basque Country
Kepa Sarasola | IXA NLP group, University of the Basque Country
This paper describes an in-depth analysis of noun + verb combinations in Spanish-Basque translations. Firstly, we examined noun + verb constructions in the dictionary, and confirmed that this kind of MWU varies considerably from language to language, which justifies the need for their specific treatment in MT systems. Then, we searched for those combinations in a parallel corpus, and we selected the most frequently-occurring ones to analyse them further and classify them according to their level of syntactic fixedness and semantic compositionality. We tested whether adding linguistic data relevant to MWUs improved the detection of Spanish combinations, and we found that, indeed, the number of MWUs identified increased by 30.30% with a precision of 97.61%. Finally, we also evaluated how an RBMT system translated the MWUs we analysed, and concluded that at least 44.44% needed to be corrected or improved.
Keywords: Basque, Spanish, Rule-Based Machine Translation, Multiword Units, morphosyntactic fixedness, semantic compositionality
Published online: 20 July 2018
Alegria, I., Ansa, O., Artola, X., Ezeiza, N., Gojenola, K., & Urizar, R.
Baldwin, T., Bender, E. M., Flickinger, D., Kim, A., & Oepen, S.
Baldwin, T., & Kim, S. N.
Bouamor, D., Semmar, N., & Zweigenbaum, P.
Copestake, A., Lambeau, F., Villavicencio, A., Bond, F., Baldwin, T., Sag, I., & Flickinger, D.
Dubremetz, M., & Nivre, J.
Gurrutxaga, A., & Alegria, I.
Heylen, D., & Maxwell, K.
Inurrieta, U., Aduriz, I., Diaz de Ilarraza, A., Labaka, G., Sarasola, K. and Carroll, J.
Inurrieta, U., Aduriz, I., Diaz de Ilarraza, A., Labaka, G., and Sarasola, K.
Mayor, A., Alegria, I., De Ilarraza, A. D., Labaka, G., Lersundi, M., & Sarasola, K.
Padró, L., & Stanilovsky, E.
Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D.
Simova, I., & Kordoni, V.
Torner, S. & Bernal, E.
Tsvetkov, Y., & Wintner, S.
Villavicencio, A., Bond, F., Korhonen, A., & McCarthy, D.
Wehrli, E., Seretan, V., Nerima, L., & Russo, L.