Classifying heuristic textual practices in academic discourse
A deep learning approach to pragmatics
In this paper, we investigate how deep learning techniques can be applied to discourse pragmatics. As a testcase we analyse heuristic textual practices, defined as linguistic implementations of decision routines in research processes in academic discourse. We develop a complex annotation scheme of pragmalinguistic categories on different levels of granularity and manually annotate a corpus of texts across various scientific disciplines. This is the basis for training recurrent neural networks to classify heuristic textual practices. Our experiments show that the annotation categories are robust enough to be recognised by our models which learn similarities of the sentence-surfaces represented as word embeddings. Our study aims at an iterative human-in-the-loop process in which manual-hermeneutic and algorithmic procedures mutually advance the insight process. It underlines the fact that the interaction between manual and automated methods opens up a promising field for further research, allowing interpretative analyses of complex pragmatic phenomena in large corpora.
Article outline
- 1.Introduction
- 2.Investigating heuristic textual practices with digital methods
- 2.1Pragmatic annotation studies and machine learning
- 2.2Corpus approaches to academic discourse
- 2.3Heuristics in academia
- 3.Data and annotation process
- 3.1Corpus
- 3.2Segmentation
- 3.3Annotation scheme
- 3.4Annotation process
- 3.5Inter-annotator agreement
- 4.Distributions
- 4.1General observations
- 4.2Description of subject vs. discourse referencing
- 4.3Modes of objective
- 5.Input and model architecture
- 5.1Input
- 5.2Model
- 5.3Experimental setup
- 6.Experiments and results on the granularity of annotation categories
- 7.Conclusion
- Acknowledgements
- Note
-
References