Some Linguistic Problems in Building a Korean Electronic Lexicon of Simple Verbs
Any system that aims to automatically process a natural language should first be equipped with a well-constructed electronic lexicon. An electronic lexicon is a large-scale database that contains all linguistic information a machine requires, i.e. inflectional, derivational, syntactic and some reliable semantic information. To build a systematic lexicon of simple verbs, several linguistic criteria should be considered. This study constitutes a preliminary step in the construction of a syntactic lexicon of Korean verbs. In section 2, we consider the problem of hada (to do) sequences, which raises the contentions of distinguishing between a simple verb and a verb phrase in Korean. In section 3, we discuss the derivational entries and complex forms: how to handle these items is not a simple question in a machine-readable lexicon. Section 4 covers the treatment of some incomplete forms in the lexicon. Finally, in the last section, we outline the direction of future work. To construct a reliable electronic lexicon, the morphosyntactic characteristics of all lexical entries have to be described in a systematic and exhaustive way. Only then can we expect to expand the list by consulting large-scale corpora. The results obtained by lexicon-grammar studies will play a significant role in the construction of a systematic electronic database, indispensable in any computational application area.