Cross-lingual Named Entity Recognition

Steinberger, Ralf; Pouliquen, Bruno

doi:10.1075/li.30.1.09ste

Article published In:

Named Entities: Recognition, classification and use
Edited by Satoshi Sekine and Elisabete Ranchhod
[Lingvisticæ Investigationes 30:1] 2007
► pp. 135–162

Cross-lingual Named Entity Recognition

Ralf Steinberger | European Commission — Joint Research Centre, Italy

Bruno Pouliquen

Named Entity Recognition and Classification (NERC) is a known and well-explored text analysis application that has been applied to various languages. We are presenting an automatic, highly multilingual news analysis system that fully integrates NERC for locations, persons and organisations with document clustering, multi-label categorisation, name attribute extraction, name variant merging and the calculation of social networks. The proposed application goes beyond the state-of-the-art by automatically merging the information found in news written in ten different languages, and by using the aggregated name information to automatically link related news documents across languages for all 45 language pair combinations. While state-of-the-art approaches for cross-lingual name variant merging and document similarity calculation require bilingual resources, the methods proposed here are mostly language-independent and require a minimal amount of monolingual language-specific effort. The development of resources for additional languages is therefore kept to a minimum and new languages can be plugged into the system effortlessly. The presented online news analysis application is fully functional and has, at the end of the year 2006, reached average usage statistics of 600,000 hits per day.

Published online: 10 August 2007

https://doi.org/10.1075/li.30.1.09ste

Cited by (8)

Cited by 8 other publications

Order by:

Balluff, Paul, Hajo G. Boomgaarden & Annie Waldherr

2024. Automatically Finding Actors in Texts: A Performance Review of Multilingual Named Entity Recognition Tools. Communication Methods and Measures ► pp. 1 ff.

Wang, Hongkai, Jun Feng, Yidan Wang, Sichen Pan, Shuai Zhao & Yi Xue

2024. Enhancing Chinese Named Entity Recognition with Disentangled Expert Knowledge. In Emerging Information Security and Applications [Communications in Computer and Information Science, 2004 ], ► pp. 92 ff.

Ehrmann, Maud, Guillaume Jacquet, Ralf Steinberger & Philipp Cimiano

2016. JRC-Names: Multilingual entity name variants and titles as Linked Data. Semantic Web 8:2 ► pp. 283 ff.

Akiyama, Yuki & Ryosuke Shibasaki

2012. A Method for Identifying Japanese Shop and Company Names by Spatiotemporal Cleaning of Eccentrically Located Frequently Appearing Words. Advances in Artificial Intelligence 2012 ► pp. 1 ff.

Krstajic, Milos, Florian Mansmann, Andreas Stoffel, Martin Atkinson & Daniel A. Keim

2010. 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), ► pp. 215 ff.

Krstajić, Miloš, Enrico Bertini, Florian Mansmann & Daniel A. Keim

2010. Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques, ► pp. 39 ff.

Kirchhoff, Thomas, Werner Schweibenz & Jörn Sieglerschmidt

2008. Archives, libraries, museums and the spell of ubiquitous knowledge. Archival Science 8:4 ► pp. 251 ff.

Pouliquen, Bruno

2008. Similarity of Names Across Scripts: Edit Distance Using Learned Costs of N-Grams. In Advances in Natural Language Processing [Lecture Notes in Computer Science, 5221], ► pp. 405 ff.

This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.