Danielsson, Pernilla and Katarina Mühlenbock. 2000. Small but efficient: the misconception of high-frequency words in Scandinavian translation. In White, John S., ed. Envisioning Machine Translation in the information future (Lecture Notes in Artificial Intelligence 1934). Cham: Springer. pp. 158–168.
Machine translation has proved itself to be easier between languages that are closely related, while far apart languages encounter much more problems. The present study focuses upon Swedish and Norwegian; two languages which are closely related. Despite their similarity though, some differences make the translation phase much less straight-forward than what could be expected. Taking the outset in sentence aligned parallel texts, this study aims at highlighting some of the differences, and to formalize the results. In order to do so, the texts have been aligned on smaller units, by a simple cognate alignment method. The longer words were easier to align, while shorter and often high-frequent words became a problem. Also when trying to align to a specific word sense in a dictionary, content words rendered better results. Therefore, the authors abandoned the use of single-word units, and searched for multi-word units whenever possible. This study reinforces the view that Machine Translation should rest upon methods based on multiword unit searches.
