Eliciting specialized frames from corpora using argument-structure extraction techniques
Frame Semantics provides a powerful cross-lingual model to describe the conceptual structure underlying specialized language. Building specialized frames is challenging because of the complex nature of predicate-argument structures, and because of the domain-specific uses of general-language predicates. Our semi-automatic method elicits semantic frames from specialized corpora. It aims to discover lexical patterns that reveal the structure of specialized frames and to populate them with corpus-based data. Firstly, we automatically extracted verb-noun triples from corpora using bootstrapping to identify noun-verb-noun phraseological patterns. Secondly, we annotated each noun-verb-noun triple with the lexical domain of the verbs and the semantic class and role of the noun filling each argument slot. We then used these annotations and patterns to classify similar triples. Thus, the structure and the types of lexical units that belong to each specialized frames were inferred. Specialized corpora analysis of environmental science texts in English and in Spanish illustrate our methodology.