Chapter published in:
Multiword Units in Machine Translation and Translation TechnologyEdited by Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor and Violeta Seretan
[Current Issues in Linguistic Theory 341] 2018
► pp. 181–200
On identification of bilingual lexical bundles for translation purposes
The case of an English-Polish comparable corpus of patient information leaflets
Grounded in phraseology and corpus linguistics, this paper aims to explore the use of bilingual lexical bundles to
improve the degree of naturalness and textual fit of translated texts. More specifically, this study attempts to
identify lexical bundles, that is, recurrent sequences of 3–7 words with similar discursive functions in a
purpose-designed comparable corpus of English and Polish patient information leaflets, with 100 text samples in each
language. Because of cross-linguistic differences, we additionally apply a number of formal criteria in order to
filter out the bundles in each subcorpus. The results show that bilingual lexical bundles with overlapping discourse
functions in texts and extracted from comparable corpora hold unexplored potential for machine translation,
computer-assisted translation and bilingual lexicography.
Keywords: lexical bundles, comparable corpora, translation quality, translation universals, patient information leaflets
Article outline
- 1.Introduction
- 2.Background and related work
- 3.Research material and methodology
- 4.Results
- 5.Discussion and conclusions
-
Notes -
References
Published online: 20 July 2018
https://doi.org/10.1075/cilt.341.09gra
https://doi.org/10.1075/cilt.341.09gra
References
Allschwil: The European Association for Machine Translation
Available at: http://www.academia.edu/4319501/Proceedings_MT_Summit_2013_Workshop_on_Multiword_units_in_Machine_Translation_and_Translation_Technology (accessed November 2014)
Baker, M.
Barreiro, A., Monti, J., Batista F. & Orliac B.
Biber, D.
Biber, D., S. Johansson, G. Leech, S. Conrad & Finegan, E.
Biber, D., Conrad, S. & Cortes, V.
Biel, Ł.
Bouayad-Agha, N
(2006) The Patient Information Leaflet (PIL) corpus. Available at: http://mcs.open.ac.uk/nlg/old_projects/pills/corpus/PIL/ (accessed May 2012).
Callison-Burch, Ch., Fordyce, C., Koehn, P., Monz, Ch. & Schroeder, J.
(2007) (Meta-) Evaluation of Machine Translation.
StatMT '07 Proceedings of the Second Workshop on Statistical Machine
Translation, Association for Computational Linguistics, 136–158. Available at: http://dl.acm.org/ft_gateway.cfm?id=1626373&type=pdf&CFID=624242940&CFTOKEN=26744291 (accessed February 2015).
(2008) Further Meta-Evaluation of Machine Translation.
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation,
Association for Computational Linguistics, 70–106. Available at: http://dl.acm.org/ft_gateway.cfm?id=1626403&type=pdf&CFID=624242938&CFTOKEN=97170002 (accessed February 2015).
Chen, Y.-H. & Baker, P.
Cheng., W, Greaves, C. & Warren, M.
Chesterman, A.
Cobb, T.
di Buono, M., Monti, J., Monteleone, M. & Marano, F.
(2013) Multiword processing in an ontology-based Cross-Language Information Retrieval model for specific
domain collections. In J. Monti, R. Mitkov, G. Corpas-Pastor, & V. Seretan (Eds.), Workshop Proceedings: Multi-Word Units in Machine Translation and Translation Technologies (pp.43–52). Allschwil: The European Association for Machine Translation. Available at: http://www.mt-archive.info/10/MTS-2013-W4-Buono.pdf (accessed November 2014).
Farwell, D., Guthrie, L. & Wilks, Y.
Forchini, P. & Murphy, A.
Frantzi, K., Ananiadou, S. & Mima, H.
Goźdź-Roszkowski, S.
Grabowski, Ł.
Granger, S.
(2010) Comparable and translation corpora in cross-linguistic research. Design, analysis and
applications. Journal of Shanghai Jiaotong University, 2, 14–21. Available at: http://sites.uclouvain.be/cecl/archives/Granger_Crosslinguistic_research.pdf (accessed November 2014).
Gray, B. & Biber, D.
Hoang, H. & Koehn, P.
(2008) Design of the Moses Decoder for Statistical Machine Translation. Software Engineering, Testing, and Quality Assurance for Natural Language Processing (pp.58–65). Columbus, Ohio, USA, June (2008) Association for Computational Linguistics. Available at: http://www.aclweb.org/anthology/W08-0510 (accessed November 2014).
Hyland, K.
Kajzer-Wietrzny, M.
(2012) Interpreting Universals and Interpreting Style. Unpublished PhD dissertation. Adam Mickiewicz University, Poznań, Poland. Available at: https://repozytorium.amu.edu.pl/jspui/bitstream/10593/2425/1/Paca%20doktorska%20Marty%20Kajzer-Wietrzny.pdf (accessed September 2012).
Kilgarriff, A.
Koehn, P., Hoang, H., Birch, A., Callison-Burch, Ch., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, Ch., Zens, R., Dyer, Ch., Bojar, O., Constantin, A. & Herbst, E.
(2007) Moses: Open Source Toolkit for Statistical Machine Translation.
Annual Meeting of the Association for Computational Linguistics (ACL), Prague,
Czech Republic, June 2007. Available at: https://www.cs.jhu.edu/~ccb/publications/moses-toolkit.pdf (accessed November 2014).
Laviosa, S.
Montalt Resurreccio, V. & Gonzalez Davies, M.
Olohan, M. & Baker, M.
Papineni, K., Roukos, S., Ward, T., Zhu, W-J.
(2002) BLEU: a method for automatic evaluation of machine translation.
Proceedings for the 40th Annual Meeting of the Association for Computation
Linguistics, Philadelphia, July 2002. (pp.311–318). Available at: http://aclweb.org/anthology/P/P02/P02-1040.pdf (accessed November 2014).
Ren, Z., Lu, Y., Cao, J., Liu, Q. & Huang, Y
(2009) Improving Statistical Machine Translation Using Domain Bilingual Multiword
Expressions. Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and
Applications. MWE’ 09. (pp.47–54). Stroudsburg: Association for Computational Linguistics. Available at: http://www.aclweb.org/anthology/W09-2907 (accessed November 2014).

Sag, I., Baldwin, T., Bond, F., Copestake, A., & Flickinger D.
(2002) Multiword Expressions: A Pain in the Neck for NLP.
Computational Linguistics and Intelligent Text Processing: Third International
Conference (CICLing 2002), 1–15. Available at: http://lingo.stanford.edu/pubs/WP-2001-03.pdf (accessed May 2013).
Salazar, D.
(2011) Lexical bundles in scientific English: A corpus-based study of native and non-native writing. Unpublished PhD dissertation. University of Barcelona. Available at: http://www.tdx.cat/bitstream/handle/10803/52083/DJLS_DISSERTATION.pdf (accessed March 2013)
Scott, D., Bouayad-Agha, N., Power, R., Shultz, S., Beck, R., Murphy, D. & Lockwood, R.
(2001) PILLS: A Multilingual Authoring System for Patient Information. Proceedings of the 2001 Meeting of the American Medical Informatics Association (AMAI'01), Washington,
D.C., USA. Available at: http://mcs.open.ac.uk/rp3242/papers/amia01.pdf (accessed May 2013).
Stubbs, M. & Barth, I.
White, J.
Cited by
Cited by 2 other publications
Lee, Changsoo
Mikhailov, Mikhail
This list is based on CrossRef data as of 31 march 2022. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.