The Swedish FrameNet++ was designed to be several things. As a digital artifact, it is an integrated panchronic lexical macroresource, primarily for Swedish, but including several other languages, intended as a basic infrastructural component in Swedish language technology research and for developing natural language processing applications. As an activity, it is a long-term R&D initiative, initially aimed at bringing about this macroresource, and now at maintaining and extending it, at promoting its use in language technology research and application development, as well as ensuring that the results of this research and development in their turn are incorporated in the macroresource. As a product of research, it reflects both computational and linguistic approaches to lexicology, lexical semantics, and lexical typology.
This chapter describes the development of Swedish FrameNet. A new framenet project often follows one of two methodological approaches: (1) extension, through translation of a different-language – often English – framenet into the target language, and (2) merging, where the resource is built from scratch in the target language. Both approaches have their pros and cons, which have been extensively discussed in the literature. Swedish FrameNet is mainly developed through the extension approach, although balanced with the merging approach. Drawing on the two approaches simultaneously, we describe how integrated language resources and tools have been exploited to create and develop Swedish FrameNet: how it was constructed, what it contains, and the basic assumptions underlying the annotation of its contents.
One of the main goals of the Swedish FrameNet++ initiative is to recycle and include as many existing modern Swedish lexical resources as possible into one unified lexical macroresource useful for automatic language processing. In this chapter we describe the structure of Saldo, the central resource of Swedish FrameNet++, the design of the formal interlinking mechanism keeping the lexical macroresource together, and our work on Swesaurus, a Swedish wordnet, and a Swedish Roget-style thesaurus as components of Swedish FrameNet++.
In this chapter we present the diachronic dimension of Swedish FrameNet++. We describe the historical lexical resources currently available for Swedish, linked to the Contemporary Swedish lexicon Saldo. We present a case study of how interlinking the dictionaries simultaneously allows us to study lexical change. We also present a method of linking text words to lexicon entries, facilitating interactive exploration of historical texts. Diachronical language resources present both a high-variation challenge from a wider language technology perspective, and an interesting object of linguistic study. While a number of improvements of the parts of the diachronic lexical macroresource are still needed, this resource is invaluable for analysing and accessing historical texts, as well as for both synchronic historical and diachronic lexical studies.
In this chapter, we explore how to develop and encode the relationship between wordnets for different languages using some Nordic and Baltic wordnets to illustrate the variety of approaches. We also briefly touch on how these wordnets have been enhanced or augmented with various types of lexical information, such as framenet frames as well as syntagmatic and sentiment information.
In this chapter we describe a multilingual extension of Swedish FrameNet++, intended to address research questions of a broad comparative nature, in genealogical, areal and typological linguistics, focusing on the integration into Swedish FrameNet++ of so-called core vocabularies, used in several linguistic subfields in order to conduct massive comparative studies involving large numbers of languages. Specifically, we describe the inclusion of two such lexical databases covering several hundred South Asian languages, with the aim of investigating areal and genealogical connections among these languages.
We evaluate several lexicon-based and corpus-based methods to automatically induce new lexical units for Swedish FrameNet, and we see that the best-performing setup uses a combination of both types of methods. A particular challenge for Swedish is the absence of a lexical resource such as WordNet; however, we show that the semantic network Saldo, which is organized according to lexicographical principles quite different from those of WordNet, is very useful for our purposes.
Creation of framenets for languages other than English based on Berkeley FrameNet has tested the hypothesis that semantic frames, to a certain extent, are language independent. This working hypothesis facilitated reuse of frames for new framenets, defining language specific frame evoking lemmas and annotating language specific sentences. The caveat is the bias towards creating what is possible, rather than typical, in a language. The reuse of frames allowed developing SweFN in a relatively short period of time. However, the goal to build a typical, not a possible Swedish framenet, necessitated some frame modifications.
This chapter provides a comparison between the English and Swedish framenets regarding semantic annotation and representation, and socio-cultural factors, including how differences forced modification of the original structure.
Multiword expressions have attracted much attention in language technology over the last two decades or so, and in general linguistics, the interest in phraseology – which includes the linguistic study of multiword expressions – goes back much further. In our work on the multilingual components of Swedish FrameNet++, we have strived to adopt a typologically informed view on multiword expressions. This raises a number of theoretical and methodological questions, some of which are discussed in this chapter.
We investigate the feasibility of automatic semantic role labeling (SRL) using Swedish FrameNet (SweFN). In the first part of the chapter, we describe a baseline system using a traditional division into segmentation and labeling steps. These subsystems are implemented as separate machine learning models, and we explore a wide range of syntactic and lexical features for these models. In the second part, we turn to the question of how the frame-to-frame relations defined in FrameNet allow us to use the annotated examples more effectively. The cross-frame generalization methods reduce the number of errors made by the labeling classifier by 27%. For previously unseen frames, the reduction is even more significant: 50%.
Multilingual natural language generation, the process of producing written or spoken utterances in parallel languages from either structured or unstructured representations requires large amounts of syntactic and semantic information to generate an expression that is tailored to the target audience. This information is offered by FrameNet-like resources, which have been developed for a number of languages. In this chapter, we present a computational FrameNet grammar resource for multilingual natural language generation. We compare between English and Swedish framenets to illustrate how these can be unified under a shared computational representation using Grammatical Framework. We demonstrate how the grammar was exploited in two practical multilingual natural language generation applications to facilitate tourist communication and empower museum users with coherent artwork descriptions.
This chapter describes and discusses the use of resources connected to Swedish FrameNet++ (SweFN++) in the context of the teaching and learning of language proficiency and grammatical analysis in Swedish. We illustrate the way in which different resources in the SweFN++ context can be useful for language pedagogy, by employing two examples, the Swedish Constructicon and a semantic role exercise on the intelligent computer assisted language learning (ICALL) platform Lärka. These resources make use of the infrastructure developed within SweFN++ in fundamentally different ways, which are discussed and compared. In addition, we discuss the possibilities for further development of the language pedagogical potential of SweFN++, both in relation to ICALL and to other types of resources and descriptive databases, like corpora, constructicons and framenets.
The Swedish FrameNet++ was designed to be several things. As a digital artifact, it is an integrated panchronic lexical macroresource, primarily for Swedish, but including several other languages, intended as a basic infrastructural component in Swedish language technology research and for developing natural language processing applications. As an activity, it is a long-term R&D initiative, initially aimed at bringing about this macroresource, and now at maintaining and extending it, at promoting its use in language technology research and application development, as well as ensuring that the results of this research and development in their turn are incorporated in the macroresource. As a product of research, it reflects both computational and linguistic approaches to lexicology, lexical semantics, and lexical typology.
This chapter describes the development of Swedish FrameNet. A new framenet project often follows one of two methodological approaches: (1) extension, through translation of a different-language – often English – framenet into the target language, and (2) merging, where the resource is built from scratch in the target language. Both approaches have their pros and cons, which have been extensively discussed in the literature. Swedish FrameNet is mainly developed through the extension approach, although balanced with the merging approach. Drawing on the two approaches simultaneously, we describe how integrated language resources and tools have been exploited to create and develop Swedish FrameNet: how it was constructed, what it contains, and the basic assumptions underlying the annotation of its contents.
One of the main goals of the Swedish FrameNet++ initiative is to recycle and include as many existing modern Swedish lexical resources as possible into one unified lexical macroresource useful for automatic language processing. In this chapter we describe the structure of Saldo, the central resource of Swedish FrameNet++, the design of the formal interlinking mechanism keeping the lexical macroresource together, and our work on Swesaurus, a Swedish wordnet, and a Swedish Roget-style thesaurus as components of Swedish FrameNet++.
In this chapter we present the diachronic dimension of Swedish FrameNet++. We describe the historical lexical resources currently available for Swedish, linked to the Contemporary Swedish lexicon Saldo. We present a case study of how interlinking the dictionaries simultaneously allows us to study lexical change. We also present a method of linking text words to lexicon entries, facilitating interactive exploration of historical texts. Diachronical language resources present both a high-variation challenge from a wider language technology perspective, and an interesting object of linguistic study. While a number of improvements of the parts of the diachronic lexical macroresource are still needed, this resource is invaluable for analysing and accessing historical texts, as well as for both synchronic historical and diachronic lexical studies.
In this chapter, we explore how to develop and encode the relationship between wordnets for different languages using some Nordic and Baltic wordnets to illustrate the variety of approaches. We also briefly touch on how these wordnets have been enhanced or augmented with various types of lexical information, such as framenet frames as well as syntagmatic and sentiment information.
In this chapter we describe a multilingual extension of Swedish FrameNet++, intended to address research questions of a broad comparative nature, in genealogical, areal and typological linguistics, focusing on the integration into Swedish FrameNet++ of so-called core vocabularies, used in several linguistic subfields in order to conduct massive comparative studies involving large numbers of languages. Specifically, we describe the inclusion of two such lexical databases covering several hundred South Asian languages, with the aim of investigating areal and genealogical connections among these languages.
We evaluate several lexicon-based and corpus-based methods to automatically induce new lexical units for Swedish FrameNet, and we see that the best-performing setup uses a combination of both types of methods. A particular challenge for Swedish is the absence of a lexical resource such as WordNet; however, we show that the semantic network Saldo, which is organized according to lexicographical principles quite different from those of WordNet, is very useful for our purposes.
Creation of framenets for languages other than English based on Berkeley FrameNet has tested the hypothesis that semantic frames, to a certain extent, are language independent. This working hypothesis facilitated reuse of frames for new framenets, defining language specific frame evoking lemmas and annotating language specific sentences. The caveat is the bias towards creating what is possible, rather than typical, in a language. The reuse of frames allowed developing SweFN in a relatively short period of time. However, the goal to build a typical, not a possible Swedish framenet, necessitated some frame modifications.
This chapter provides a comparison between the English and Swedish framenets regarding semantic annotation and representation, and socio-cultural factors, including how differences forced modification of the original structure.
Multiword expressions have attracted much attention in language technology over the last two decades or so, and in general linguistics, the interest in phraseology – which includes the linguistic study of multiword expressions – goes back much further. In our work on the multilingual components of Swedish FrameNet++, we have strived to adopt a typologically informed view on multiword expressions. This raises a number of theoretical and methodological questions, some of which are discussed in this chapter.
We investigate the feasibility of automatic semantic role labeling (SRL) using Swedish FrameNet (SweFN). In the first part of the chapter, we describe a baseline system using a traditional division into segmentation and labeling steps. These subsystems are implemented as separate machine learning models, and we explore a wide range of syntactic and lexical features for these models. In the second part, we turn to the question of how the frame-to-frame relations defined in FrameNet allow us to use the annotated examples more effectively. The cross-frame generalization methods reduce the number of errors made by the labeling classifier by 27%. For previously unseen frames, the reduction is even more significant: 50%.
Multilingual natural language generation, the process of producing written or spoken utterances in parallel languages from either structured or unstructured representations requires large amounts of syntactic and semantic information to generate an expression that is tailored to the target audience. This information is offered by FrameNet-like resources, which have been developed for a number of languages. In this chapter, we present a computational FrameNet grammar resource for multilingual natural language generation. We compare between English and Swedish framenets to illustrate how these can be unified under a shared computational representation using Grammatical Framework. We demonstrate how the grammar was exploited in two practical multilingual natural language generation applications to facilitate tourist communication and empower museum users with coherent artwork descriptions.
This chapter describes and discusses the use of resources connected to Swedish FrameNet++ (SweFN++) in the context of the teaching and learning of language proficiency and grammatical analysis in Swedish. We illustrate the way in which different resources in the SweFN++ context can be useful for language pedagogy, by employing two examples, the Swedish Constructicon and a semantic role exercise on the intelligent computer assisted language learning (ICALL) platform Lärka. These resources make use of the infrastructure developed within SweFN++ in fundamentally different ways, which are discussed and compared. In addition, we discuss the possibilities for further development of the language pedagogical potential of SweFN++, both in relation to ICALL and to other types of resources and descriptive databases, like corpora, constructicons and framenets.