Extracting terms by a combination of term frequency and a measure of term representativeness

Hisamitsu, Toru; Niwa, Yoshiki; Nishioka, Shingo; Sakurai, Hirofumi; Imaichi, Osamu; Iwayama, Makoto; Takano, Akihiko

doi:10.1075/term.6.2.06his

Article published In:

Japanese Term Extraction
Kyo Kageura and Teruo Koyama
[Terminology 6:2] 2000
► pp. 211–232

Extracting terms by a combination of term frequency and a measure of term representativeness

Toru Hisamitsu

Yoshiki Niwa

Shingo Nishioka

Hirofumi Sakurai

Osamu Imaichi

Makoto Iwayama

Akihiko Takano

This article describes a method for extracting terms that combines term frequency with a novel measure of term representativeness (i.e., informativeness or domain specificity). The measure is defined as the normalized distance between the word distribution in the documents which contain the term and the word distribution in the whole corpus. The measure is particularly effective in discarding uninformative terms that frequently appear and has a well-defined threshold value for judging the representativeness of a term. We combined the new measure with term frequency and applied it to the extraction of terms from abstracts of artificial intelligence papers. This article introduces the measure and reports on its effectiveness in term extraction.

Published online: 1 October 2001

https://doi.org/10.1075/term.6.2.06his

Cited by (8)

Cited by eight other publications

Order by:

Suzuki, T., S. Kawamura, F. Yoshikane, K. Kageura & A. Aizawa

2012. Co-occurrence-based indicators for authorship analysis. Literary and Linguistic Computing 27:2 ► pp. 197 ff.

KOYAMA, Teruo

2010. Composite Term Extraction from Japanese Texts. Joho Chishiki Gakkaishi 19:4 ► pp. 306 ff.

Suzuki, T.

2009. Investigating Japanese government's perceptions of the postwar world as revealed in prime ministers' Diet addresses: focussing on East-West and North-South issues. International Relations of the Asia-Pacific 9:2 ► pp. 317 ff.

Horyu, Daisuke & Seishi Ninomiya

2007. Additional Selection of Extracted Terms for a Specific Area. Agricultural Information Research 16:2 ► pp. 52 ff.

Hattori, S., T. Tezuka & K. Tanaka

2006. 7th International Conference on Mobile Data Management (MDM'06), ► pp. 77 ff.

Wakaki, Hiromi, Tomonari Masada, Atsuhiro Takasu & Jun Adachi

2006. A New Measure for Query Disambiguation Using Term Co-occurrences. In Intelligent Data Engineering and Automated Learning – IDEAL 2006 [Lecture Notes in Computer Science, 4224], ► pp. 904 ff.

Iwayama, Makoto & Yoshiki Niwa

2005. Just-in-Time Interactive Document Search. In Professional Knowledge Management [Lecture Notes in Computer Science, 3782], ► pp. 710 ff.

Hisamitsu, Toru & Jun-ichi Tsujii

2003. Measuring Term Representativeness. In Information Extraction in the Web Era [Lecture Notes in Computer Science, 2700], ► pp. 45 ff.

This list is based on CrossRef data as of 10 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.