Publications

Publication details [#4318]

Hisamitsu, Toru and Yoshiki Niwa. 2001. Extracting useful terms from parenthetical expressions by combining simple rules and statistical measures: a comparative evaluation of bigram statistics. In Bourigault, Didier, Christian Jacquemin and Marie-Claude L' Homme, eds. Recent advances in computational terminology. Amsterdam: John Benjamins. pp. 209–224.
Publication type
Article in jnl/bk
Publication language
English

Abstract

One year’s worth of Japanese newspaper articles contains about 300,000 ‘parenthetical expressions (PEs)’, pairs of character strings A and B related to each other by parentheses as in A(B). These expressions contain a large number of important terms, such as organization names, company names, and their abbreviations, and are easily extracted by pattern matching. The authors present a method for collecting unregistered terms from PEs which identified two types of PEs by using pattern matching, bigram statistics, and entropy, and collected about 17,000 terms with over 97% precision.
Source : Based on abstract in book