Constructing an ontology and database of Japanese lexical properties
Handling the orthographic complexity of the Japanese writing system
As a significant milestone within ongoing efforts to construct a comprehensive database in the form of a lexical resource (LR) of Japanese Lexical Properties (JLP-LR), this paper outlines the initial construction of an Ontology of Japanese Lexical Properties (JLP-O) (Joyce & Hodošček 2014), and, in particular, describes some of its key aspects specifically incorporated in order to satisfactorily handle the orthographic complexity of the Japanese writing system (Joyce 2013, 2016; Joyce, Hodošček & Nishina 2012). While motivated primarily by issues of orthographic representation for the Japanese lexicon, these key features potentially have wider implications for the effective construction of integrated orthographic databases and lexicons.
Article outline
- 1.Introduction
- 2.Ontology of Japanese lexical properties (JLP-O)
- 3.Handling aspects of the Japanese writing system
- 3.1Character LEs and character module
- 3.2
canonicalForm and orthographicForm
- 3.3Forms of decomposition
- 3.3.1Orthographic decomposition
- 3.3.2Phonological decomposition
- 3.3.3Morphological decomposition
-
4.Conclusion
- Notes
-
References
References
Adelman, James S.
(
2012)
Methodological issues with words. In
James S. Adelman (Ed.),
Visual word recognition volume 1: Models and methods, orthography and phonology (
Current issues in the psychology of language) (pp. 116–138). London: Psychology Press.
Backhouse, A. E.
(
1984)
Aspects of the graphological structure of Japanese.
Visible Language 181: 219–228.
Bunkachō [Agency for Cultural Affairs]
(
2010)
Jōyōkanjihyō [
Jōyō kanji list]. Available at
[URL] (13 November 2016).
Den, Yasuharu,
Toshinobu Ogiso,
Hideki Ogura,
Atsushi Yamada,
Nobuaki Minematsu,
Kiyotaka Uchimoto &
Hanae Koiso (
2007)
Kōpasu nihongogaku no tame no gengo shigen: Keitaisokaisekiyō denshijisho no kaihatsu to ōyō [The development of an electronic dictionary for morphological analysis and its application to Japanese corpus linguistics].
Nihongo Kagaku [
Japanese Linguistics], 221: 101–122.
Guarino, Nicola
(
1998)
Formal ontology in information systems.
Proceedings of the first international conference on Formal Ontology in Information Systems (FOIS’ 98) (Vol. 461). IOS Press.
Guarino, Nicola, Daniel Oberle & Steffen Staab
(
2009)
What is an ontology? In
Steffen Staab &
Rudi Studer (Eds.),
Handbook on ontologies (Second edition; International handbooks on information systems) (pp. 1–17). Springer.
Huang, Chu-Ren, Nicoletta Calzolari, Aldo Gangemi, Alessandro Lenci, Alessandro Oltramari & Laurent Prévot
(Eds.) (
2010)
Ontology and the lexicon: A natural language processing perspective. (
Studies in Natural Language Processing). Cambridge: Cambridge University Press.
Joyce, Terry
(
2005)
Constructing a large-scale database of Japanese word associations. In
Katsuo Tamaoka, (Ed.).
Corpus Studies on Japanese Kanji (
Glottometrics 10) (pp. 82–98). Hituzi Syobo: Tokyo, Japan and RAM-Verlag: Lüdenschied, Germany.
Joyce, Terry
(
2016)
Writing systems and scripts. In
Andrea Rocci &
Louis de Saussure (Eds.),
Verbal communication (
Handbooks of Communication Science 3) (pp. 287–308). Berlin/Boston: De Gruyter Mouton.
Joyce, Terry, & Bor Hodošček
(
2014)
Constructing an ontology of Japanese lexical properties: Specifying its property structures and lexical entries. In
Michael Zock,
Reinhard Rapp, &
Chu-Ren Huang (Eds.),
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex4) (pp. 174–185). 23 August, 2014. Dublin, Ireland.
Joyce, Terry, Bor Hodošček & Hisashi Masuda
(
2014a)
Constructing an ontology and database of Japanese lexical properties: Handling the orthographic complexity of the Japanese writing system.
‘Orthographic Databases and Lexicons’: 9th International Workshop on Writing Systems and Literacy, 4–5 September. University of Sussex, Brighton, UK.
Joyce, Terry, Bor Hodošček & Hisashi Masuda
(
2014b)
Quantitative study of 3- and 4-kanji Japanese compound words: Database extraction and automatic analysis of word structures.
15th International Conference on the Processing of East Asian Languages, 24–26 October, 2014. Korean University, Seoul, Korea.
Joyce, Terry, Bor Hodošček & Kikuko Nishina
Joyce, Terry, Hisashi Masuda & Bor Hodošček
(
2016)
Constructing a database of Japanese lexical properties: Outlining its basic framework and initial components.
Tama University School of Global Studies Bulletin, 81, 35–60.
Joyce, Terry, Hisashi Masuda & Taeko Ogawa
Masuda, Hisashi (
2014)
Kanjinijihyōkigokan no imitekikankeisei ni kansuru dētabēsu no kōchiku [
Constructing a database of the semantic relationships within two-kanji orthographic words].
Kagaku Kenkyū Hijo Seijigyō Kenkyū Seika Hōkokusho [Research Report for Grant-in-Aid for Scientific Research from the Japanese Society for the Promotion of Science].
Masuda, Hisashi, & Terry Joyce
(
2005)
A database of two-kanji compound words featuring morphological family, morphological structure, and semantic category data. In
Katsuo Tamaoka, (Ed.).
Corpus Studies on Japanese Kanji (
Glottometrics 10) (pp. 30–44). Hituzi Syobo: Tokyo, Japan and RAM-Verlag: Lüdenschied, Germany.
Masuda, Hisashi, Terry Joyce, Taeko Ogawa, Masahiro Kawakami & Chikako Fujita
(
2014)
A database of semantic transparency ratings for two-kanji Japanese compound words. Poster presentation given at
‘Orthographic Databases and Lexicons’: 9th International Workshop on Writing Systems and Literacy
, 4–5 September, 2014. University of Sussex, Brighton, UK.
Maekawa, Kikuo, Makoto Yamazaki, Toshinobu Ogiso, Takeiko Maruyama, Hideki Ogura, Wakako Kashino, Hanae Koiso, Masaya Yamaguchi & Yasuharu Den
(
2013)
Balanced corpus of contemporary written Japanese.
Language Resources and Evaluation, 1–27.
Morohashi, Tetsuji
(
2000)
Daikanwajiten [
Comprehensive Chinese-Japanese dictionary] (Vols. 131). Tokyo: Taishukan.
Nation, I. S. P.
(
2013)
Learning vocabulary in another language (Second edition) (Cambridge applied linguistics). Cambridge: Cambridge University Press.
Ogura, Hideki, Toshinobu Ogiso, Hanae Koiso, Yutaka Hara, & Sayaka Miyauchi
(
2010,
March).
Keitaiso kaiseki jisho UniDic ni okeru goiso midashi no rikkō hōshin [Criteria for the lemmatization of UniDic],
Tokuteiryōiki kenkyū “nihongo kōpasu” heisei 21 nendo kōkai waakushoppu (Kenkyū seika hōkokukai) yokōshū [
Priority-Area Research “Japanese Corpus”: Proceedings of the 2010 public workshop]. Tokyo: General Headquarters, Priority-Area Research “Japanese Corpus”.
Oltramari, Alessandro, Piek Vossen, Lu Qin & Hovy, Eduard
(
2013)
New trends of research in ontologies and lexical resources: Ideas, projects, systems. Springer.
Prévot, Laurent, Chu-Ren Huang, Nicoletta Calzolari, Aldo Gangemi, Alessandro Lenci & Alessandro Oltramari
(
2010)
Ontology and the lexicon: a multidisciplinary perspective. In
Chu-Ren Huang,
Nicoletta Calzolari,
Aldo Gangemi,
Alessandro Lenci,
Alessandro Oltramari &
Laurent Prévot (Eds.),
Ontology and the lexicon: a natural language processing perspective (
Studies in Natural Language Processing) (pp. 3–24). Cambridge: Cambridge University Press.
Shinmura, Izuru
(
2008)
Kōjien (
Japanese dictionary) (6th edition). Tokyo: Iwanami Shoten.
Spohr, Dennis
(
2012)
Towards a multifunctional lexical resource: Design and implementation of a graph-based lexicon model (
Lexicographica, series major). Berlin/Boston: Walter De Gruyter.
Yamada, Tadao, Takeshi Shibata, Kenji Sakai, Yasuo Kuramochi, Akio Yamada, Zendō Ueno, Masahiro Ijima & Hiroyuki Sasahara
(
2011)
Shinmeikai Kokugo Jiten [Shinmeikai Japanese-Japanese dictionary] (7th edition). Tokyo: Sanseido.
Cited by
Cited by 3 other publications
Joyce, Terry & Hisashi Masuda
Joyce, Terry & Hisashi Masuda
Santoso, Joan, Esther Irawati Setiawan, Christian Nathaniel Purwanto, Eko Mulyanto Yuniarno, Mochamad Hariadi & Mauridhi Hery Purnomo
2021.
Named entity recognition for extracting concept in ontology building on Indonesian language using end-to-end bidirectional long short term memory.
Expert Systems with Applications 176
► pp. 114856 ff.
This list is based on CrossRef data as of 15 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.