Part of
Investigating Wikipedia: Linguistic corpus building, exploration and analysis
Edited by Céline Poudat, Harald Lüngen and Laura Herzberg
[Studies in Corpus Linguistics 121] 2024
► pp. 4574
References (28)
References
Adafre, Sisay F. & de Rijke, Maarten. 2006. Finding similar sentences across multiple languages in Wikipedia. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Diana McCarthy & Shuly Wintner (eds), 62–69. Stroudsburg PA: ACL.Google Scholar
Artetxe, Mikel & Schwenk, Holger. 2018. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. arXiv.1812.10464. DOI logoGoogle Scholar
Bouamor, Dhouha. 2014. Constitution of Multilingual Linguistic Resources from Parallel and Comparable Text Corpora. PhD dissertation, Université Paris-Sud.
Brunette, Louise & Gagnon, Chantal. 2013. Enseigner la révision à l’ère des wikis: Là où l’on trouve la technologie alors qu’on ne l’attendait plus. JoSTrans. The Journal of Specialized Translation 19: 96–121.Google Scholar
Church, Kenneth W. 1993. Char-align: A program for aligning parallel texts at the character level. In Proceedings of the 31st Annual Meeting of the Associatoin of Computational Linguistics, Columbus OH, 22–26 June, 1–8. Stroudsburg PA: ACL.Google Scholar
Etchegoyhen, Thierry & Azpeitia, Andoni. 2016. A portable method for parallel and comparable document alignment. Baltic Journal of Modern Computing 4(2): 243–255.Google Scholar
Gabrilovich, Evgeniy & Markovitch, Shaul. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07). Morgan Kaufmann Publishers, 1606–1611.Google Scholar
Gupta, Rajdeep, Pal, Santanu & Bandyopadhyay, Sivaji. 2013. Improving MT system using extracted parallel fragments of text from comparable corpora. In Proceedings of the Sixth Workshop on Building and Using Comparable Corpora, Serge Sharoff, Pierre Zweigenbaum & Reinhard Rapp (eds), 69–76. Stroudsburg PA: ACL.Google Scholar
Johnson, Jeff, Douze, Matthijs & Hervé, Jégou. 2017. Billion-scale similarity search with GPUs. arXiv.1702.08734v1. DOI logoGoogle Scholar
Lamraoui, Fethi & Langlais, Philippe. 2013. Yet another fast, robust and open source sentence aligner. Time to reconsider sentence alignment? In Proceedings of the Machine Translation Summit 2013. 〈[URL]〉 (1 June 2024).
McEnery, Anthony & Xiao, Zhonghua. 2007. Parallel and comparable corpora: What is happening? In Incorporating Corpora: The Linguist and the Translator, Gunilla Anderman & Margaret Rogers (eds). Clevedon: Multilingual Matters. DOI logoGoogle Scholar
Mohammadi, Mehdi & Ghasem Aghaee, Naser. 2010. Building bilingual parallel corpora based on Wikipedia. In Proceedings of the Second International Conference on Computer Engineering and Applications (ICCEA 2010), Bali, Indonesia, 19–21 March. IEEE. DOI logoGoogle Scholar
Moore, Robert C. 2002. Fast and accurate sentence alignment of bilingual corpora. In Proceeding of the 5th Conference of the Association for Machine Translation in the Americas, 135–144. New York NY: Springer. DOI logoGoogle Scholar
Morin, Emmanuel, Daille, Béatrice, Takeuchi, Koichi & Kageura, Kyo. 2007. Bilingual Terminology mining — Using brain, not brawn comparable corpora. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07), 664–671. Stroudsburg PA: ACL.Google Scholar
Patry, Alexandre & Langlais, Philippe. 2011. Identifying parallel documents from a large bilingual collection of texts: Application to parallel article extraction in wikipedia. In Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, Pierre Zweigenbaum, Reinhard Rapp & Serge Sharoff (eds), 87–95. Stroudsburg PA: ACL.Google Scholar
Plamadă, Magdalena & Volk, Martin. 2013. Mining for domain-specific parallel text from Wikipedia. In Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, Serge Sharoff, Pierre Zweigenbaum & Reinhard Rapp (eds), 112–120. Stroudsburg PA: ACL.Google Scholar
Prochasson, Emmanuel & Fung, Pascale. 2011. Rare word translation extraction from aligned comparable documents. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 1327–1335. Stroudsburg PA: ACL. 〈[URL]〉 (1 June 2024).
Rapp, Reinhard, Sharoff, Serge, & Bebych, Bogdan. 2012. Identifying word translations from comparable documents without a seed lexicon. In Proceedings of LREC 2012, Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis (eds). 〈[URL]
Schwenk, Holger, Chaudhary, Vishrav, Sun, Shuo, Gong, Hongyu & Guzmán, Francisco. 2019. WikiMatrix: Mining 135M parallel sentences in 1620 language pairs from Wikipedia. arXiv.1907.05791. DOI logoGoogle Scholar
Semmar, Nasredine. 2021. Multilingualism and Automatic Processing of Well and Poorly Endowed Languages. HDR dissertation, Paris Saclay University.
Sharoff, Serge, Zweigenbaum, Pierre & Rapp, Reinhard. 2015. BUCC shared task: Cross-language document similarity. Proceedings of the 8th Workshop on Building and Using Comparable Corpora, 74–78. Beijing, China, June. DOI logoGoogle Scholar
Ştefănescu, Dan & Ion, Radu. 2013. Parallel-Wiki: A collection of parallel sentences extracted from Wikipedia. Research in Computing Science, Vol. 70: Advances in Computing Science. Greece. DOI logoGoogle Scholar
Ştefănescu, Dan, Ion, Radu & Hunsicker, S. 2012. Hybrid parallel sentence mining from comparable corpora. In Proceedings of the 16th Conference of the European Association for Machine Translation, Trento, Italy, 28–30 May, Mauro Cettolo, Marcello Federico, Lucia Specia & Andy Way (eds), 137–144. Fondazione Bruno Kessler.Google Scholar
Trieu, Hai-Long & Ittoo, Ashwin. 2019. Generation of parallel corpus for low resource language translation. ORBi Open Repository and Bibliography, Liège. 〈[URL]〉 (1 June 2024).
Tufiş, Dan, Ion, Radu, Dumitrescu, Ştefan, Ştefănescu, Dan. 2014. Large SMT data-sets extracted from Wikipedia. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). 656–663, Reykjavik, Iceland.Google Scholar
Varga, Daniel, Németh, László, Halácsy, Peter, Kornai, András, Trón, Viktor & Nagy, Viktor. 2005. Parallel corpora for medium density languages. In Proceedings of the RANLP 2005, 590–596.Google Scholar
Wołk, Krzysztof & Marasek, Krzysztof. 2014. Building subject-aligned comparable corpora and mining it for truly parallel sentence pairs. Procedia Technology, 18, 126–132. DOI logoGoogle Scholar
Yasuda, Keiji & Sumita, Eiichiro. 2008. Method for building sentence-aligned corpus from Wikipedia. In 2008 AAAI Workshop on Wikipedia and Artificial Intelligence (WikiAI08), 263–268.Google Scholar