Classifying heuristic textual practices in academic discourse: A deep learning approach to pragmatics

Becker, Maria; Bender, Michael; Müller, Marcus

doi:10.1075/ijcl.19097.bec

Article published In:

International Journal of Corpus Linguistics
Vol. 25:4 (2020) ► pp.426–460

Classifying heuristic textual practices in academic discourse

A deep learning approach to pragmatics

Maria Becker | Heidelberg University

Michael Bender | TU Darmstadt

Marcus Müller | TU Darmstadt

In this paper, we investigate how deep learning techniques can be applied to discourse pragmatics. As a testcase we analyse heuristic textual practices, defined as linguistic implementations of decision routines in research processes in academic discourse. We develop a complex annotation scheme of pragmalinguistic categories on different levels of granularity and manually annotate a corpus of texts across various scientific disciplines. This is the basis for training recurrent neural networks to classify heuristic textual practices. Our experiments show that the annotation categories are robust enough to be recognised by our models which learn similarities of the sentence-surfaces represented as word embeddings. Our study aims at an iterative human-in-the-loop process in which manual-hermeneutic and algorithmic procedures mutually advance the insight process. It underlines the fact that the interaction between manual and automated methods opens up a promising field for further research, allowing interpretative analyses of complex pragmatic phenomena in large corpora.

Keywords: discourse pragmatics, textual practices, academic discourse, deep learning, annotation

Article outline

1.Introduction
2.Investigating heuristic textual practices with digital methods
- 2.1Pragmatic annotation studies and machine learning
- 2.2Corpus approaches to academic discourse
- 2.3Heuristics in academia
3.Data and annotation process
- 3.1Corpus
- 3.2Segmentation
- 3.3Annotation scheme
- 3.4Annotation process
- 3.5Inter-annotator agreement
4.Distributions
- 4.1General observations
- 4.2Description of subject vs. discourse referencing
- 4.3Modes of objective
5.Input and model architecture
- 5.1Input
- 5.2Model
- 5.3Experimental setup
6.Experiments and results on the granularity of annotation categories
7.Conclusion
Acknowledgements
Note
References

Published online: 11 November 2020

https://doi.org/10.1075/ijcl.19097.bec

References (57)

References

Aijmer, K. (2015). Pragmatic markers. In K. Aijmer & C. Rühlemann (Eds.), Corpus Pragmatics: A Handbook (pp. 195–218). Cambridge University Press.

Archer, D., & Culpeper, J. (2003). Sociopragmatic annotation: New directions and possibilities in historical corpus linguistics. In A. Wilson, P. Rayson & T. McEnery (Eds.), Corpus Linguistics by the Lune: A Festschrift for Geoffrey Leech (pp. 37–58). Peter Lang.

Archer, D., Culpeper, J., & Davies, M. (2008). Pragmatic annotation. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook (pp. 613–641). De Gruyter.

Balbaschewski, M. (2015). Das Bankhaus H. Aufhäuser 1870–1938: Netzwerkbildung und ihre Auswirkung auf die Verdrängungsbestrebungen und „Arisierung“ im Nationalsozialismus [The Bankhaus H. Aufhäuser 1870–1938: Networking and its Impact on Liquidation and “Aryanization” in National Socialism]. [Doctoral dissertation, TU Darmstadt]. TU Prints. [URL]

Becker, M., Staniek, M., Nastase, V., Palmer, A., & Frank, A. (2017). Classifying semantic clause types: Modeling context and genre characteristics with recurrent neural networks and attention. In Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM) (pp. 230–240).

(2019). Classifying semantic clause types with recurrent neural betworks: Analysis of attention, context and genre characteristics. TAL Journal (Traitement Automatique des Langues / Natural Language Processing), 59(2), 15–48.

Bender, M., & Müller, M. (2020). Heuristische Textpraktiken. Eine kollaborative Annotationsstudie zum akademischen Diskurs [Heuristic Text Practices. A Collaborative Annotation Study on Academic Discourse]. Zeitschrift für Germanistische Linguistik (ZGL), 48(1), 1–46.

Benitez-Castro, M.-A., & Thompson, P. (2015). Shell-nounhood in academic discourse: A critical state-of-the-art review. International Journal of Corpus Linguistics, 20(3), 378–404.

Bhatia, V. K. (2002). A generic view of academic discourse. In J. Flowerdew (Ed.), Academic Discourse (pp. 21–39). Routledge.

Biber, D., & Gray, B. (2016). Grammatical Complexity in Academic English: Linguistic Change in Writing. Cambridge University Press.

Braun, S. (2016). Einflussfaktoren auf den Wechsel des Abschlussprüfers: Eine empirische Analyse bei kapitalmarktorientierten Unternehmen [Influential Factors on Discontinuing Audit Engagements: An Empiric Analysis of Capital-marketoriented Enterprises] [Doctoral dissertation, TU Darmstadt]. TU Prints. [URL]

Bunton, D. (2002). Generic moves in Ph.D. thesis introductions. In J. Flowerdew (Ed.), Academic Discourse (pp. 57–75). Longman.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.

Cotos, E., Huffman, S., & Link, S. (2017). A move/step model for methods sections: Demonstrating rigour and credibility. English for Specific Purposes, 461, 90–106.

Didzoleit, H. (2016). Struktur und Magnetismus von Ferrocen und ferrocenhaltigen Polymeren in dünnen Filmen [Structure and magnetism of Ferrocene and Ferrocene-containing polymers in thin films] [Doctoral dissertation, TU Darmstadt]. TU Prints. [URL]

Dieleman, O. (2016). Hinweise für die Entwicklung von Verfahren zur maßnahmenartübergreifenden Dringlichkeitsbewertung von Straßenbaumaßnahmen: Ein Beitrag zur Entscheidungsfindung im Rahmen der Aufstellung von Bauprogrammen für Straßenbaumaßnahmen [Information for the Development of Guiding Principles for Priority Appraisal Assessment Procedures: A Contribution to Decision-making in the Context of Drawing Up Construction Programs for Road Construction Measures] [Doctoral dissertation, TU Darmstadt]. TU Prints. [URL]

Eggs, E. (2000). Vertextungsmuster Argumentation: Logische Grundlagen [Textualisation pattern argumentation: Logical foundations]. In K. Brinker, G. Antos, W. Heinemann, & S. F. Sager (Eds.), Text- und Gesprächslinguistik, Bd. 1 [Text and Conversation Linguistics, Volume 1] (pp. 397–414). De Gruyter.

Feilke, H. (2012). Was sind Textroutinen? Zur Theorie und Methodik des Forschungsfeldes [What are text routines? On the theory and methodology of the research field]. In Feilke, H. & Lehnen, K. (Eds.), Schreib- und Textroutinen: Theorie, Erwerb und didaktisch-mediale Modellierung [Writing and Text Routines: Theory, Acquisition and Didactic Media Modelling] (pp. 1–31). Lang.

Gottschling, A. (2016). Modellierung und Simulation von Altpapiersortieranlagen [Modelling and Simulation of Recovered Paper in Industrial Sorting Plants] [Doctoral dissertation, TU Darmstadt]. TU Prints. [URL]

Greve, W., & Wentura, D. (1997). Wissenschaftliche Beobachtung: Eine Einführung [Scientific Observation: An Introduction]. PVU/Beltz.

Hedderich, M., & Klakow, D. (2018). Training a neural network in a low-resource setting on automatically annotated noisy data. In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP (pp. 12–18). Association for Computational Linguistics. [URL].

Henrici, N. (2016). Die Ansprüche und Rechte des mit der Objektüberwachung der Gebäudeerrichtung beauftragten Architekten und Ingenieurs bei Bauablaufstörungen [The Claims and Rights of the Architect and Engineer Comissioned with the Construction Supervision in Case of Construction Disturbance] [Doctoral dissertation, TU Darmstadt]. TU Prints. [URL]

Hess, V. (2015). Psychobiologische Entspannungsreaktion bei abstinenten suchtkranken Patienten: Interindividuelle Differenzen in Abhängigkeit von stressbezogenen dispositionellen Verhaltensweisen und Persönlichkeitsmerkmalen [Psychobiological Relaxation Reaction in Abstinent Addicted Patients: Interindividual Differences Depending on Stress-related Dispositional Behavior and Personality Traits] [Doctoral dissertation, TU Darmstadt]. TU Prints. [URL]

Hey, S. P. (2016). Heuristics and meta-heuristics in scientific judgement. The British Journal for the Philosophy of Science, 761, 471–495.

Hufler, T. (2016). Automorphe Formen auf orthogonalen und unitären Gruppen [Automorphic Forms on Orthogonal and Unitary Groups] [Doctoral dissertation, TU Darmstadt]. TU Prints. [URL]

Hyland, K. (1998). Persuasion and context: The pragmatics of academic metadiscourse. Journal of Pragmatics, 30(4), 437–455.

(2004). Disciplinary Discourses: Social Interactions in Academic Writing. The University of Michigan Press.

(2006). Disciplinary differences: Language variations in academic discourses. In K. Hyland (Ed.), Academic Discourse across Disciplines (pp. 17–48). Peter Lang.

(2009). Academic Discourse: English in a Global Context. Continuum.

Hyland, K., & Jiang, F. (2018). Academic lexical bundles. How are they changing? International Journal of Corpus Linguistics, 23(4), 383–407.

Johnson, R., & Zhang, T. (2015). Effective use of word order for text categorization with Convolutional Neural Networks. In Proceedings of the Annual Conference of the North American Chapter of the ACL (NAACL) (pp. 103–112). Association for Computational Linguistics. [URL]

Kanoksilapatham, B. (2005). Rhetorical structure of biochemistry research articles. English for Specific Purposes, 241, 269–292.

(2007). Rhetorical moves in biochemistry research articles. In Biber, D., Connor, U., & Upton, T. A. (Eds.), Discourse on the Move: Using Corpus Analysis to Describe Discourse Structure (pp. 73–119). John Benjamins.

Knorr-Cetina, K. (1999). Epistemic Cultures: How the Sciences Make Knowledge. Harvard University Press.

Kommoß, B. (2016). Die Hydrierung von CO2 zu CH3OH unter überkritischen Bedingungen: Eine reaktionstechnische Untersuchung [The Catalytic Hydrogenation of CO2 to CH3OH under Supercritical Conditions: A Reaction Engineering Investigation] [Doctoral dissertation, TU Darmstadt]. TU Prints. [URL]

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.

Lee, L., & Dernoncourt, F. (2016). Sequential short-text classification with Recurrent and Convolutional Neural Networks. In Proceedings of the 2016 Conference of the North American Chapter of the ACL: Human Language Technologies (pp. 515–520). Association for Computational Linguistics. [URL]

Liu, L., Mu, F., Li, P., Mu, X., Tang, J., Ai, X., Fu, R., Wang, L., & Zhou, X. (2019). NeuralClassifier: An open-source neural hierarchical multi-label text classification toolkit. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 87–92). Association for Computational Linguistics. [URL].

Liu, P., Shafiq, J., & Meng, H. (2015). Fine-grained opinion mining with Recurrent Neural Networks and Word Embeddings. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1433–1443). Association for Computational Linguistics. [URL]

Madabushi, H. T., Lee, M., & Barnden, J. (2018). Integrating question classification and deep learning for improved answer selection. In Proceedings of the International Conference on Computational Linguistics (COLING) (pp. 3283–3294). Association for Computational Linguistics. [URL]

Meister, J.-C., Petris, M., Gius, E., Jacke, J., Horstmann, J., & Bruck, C. (2018). CATMA (Version v5.2) [Computer software]. [URL]

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119). Neural Information Processing Systems (NIPS). [URL]

Ng, H. T., Lim, C. Y., & Foo, S. K. (1999). A case study on inter-annotator agreement for word sense disambiguation. In Proceedings of the ACL SIGLEX Workshop: Standardizing Lexical Resources. Association for Computational Linguistics. [URL]

Nielsen, M. (2019). Neural Networks and Deep Learning. [URL]

Ravenscroft, J., Oellrich, A., Saha, S., & Liakata, M. (2016). Multi-label annotation in scientific articles – The Multi-label Cancer Risk Assessment Corpus. In Proceedings of the International Conference on Language Resources and Evaluation (LREC) (pp. 4115–4123). Association for Computational Linguistics. [URL]

Reimers, N., Eckle-Kohler, J., Schnober, C., Kim, J., & Gurevych, I. (2014). Germeval2014: Nested Named Entity Recognition with neural networks. In Proceedings of the 12th Edition of the KONVENS Conference (pp. 117–120). [URL]

Santos, C. N., & Gatti, M. (2014). Deep Convolutional Neural Networks for sentiment analysis of short texts. In Proceedings of the 25th International Conference on Computational Linguistics (COLING) (pp. 69–78). Association for Computational Linguistics. [URL]

Steinert, K. (2017). Collaborative Web-Based Short Text Annotation with Online Label Suggestion [MA thesis, TU Darmstadt]. [URL]

Swales, J. M. (1990). Genre Analysis: English in Academic and Research Settings. Cambridge University Press.

Teruel, M., Cardellino, C., Cardellino, F., Alonso Alemany, L., & Villata, S. (2018). Increasing argument annotation reproducibility by using inter-annotator agreement to improve guidelines. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC) (pp. 4061–4064). Association for Computational Linguistics. [URL]

Thompson, P. (2016). Genre approaches to theses and dissertations. In K. Hyland & P. Shaw (Eds.), The Routledge Handbook of English for Academic Purposes (pp. 379–391). Routledge.

Wang, L., & Ling, W. (2016). Neural network-based abstract generation for opinions and arguments. In Proceedings of the 2016 Conference of the North American Chapter of the ACL (NAACL): Human Language Technologies (pp. 47–57). Association for Computational Linguistics. [URL]

Weisser, M. (2015). Speech act annotation. In K. Ajmer & C. Rühlemann (Eds.), Corpus Pragmatics: A Handbook (pp. 84–116). Cambridge University Press.

(2018). How to Do Corpus Pragmatics on Pragmatically Annotated Data. John Benjamins.

Wenninger, H. (2016). Der Einfluss sozialer Online-Netzwerke auf ihre Mitglieder: Eine Analyse von Nutzungsarten und sozialen Mechanismen [The influence of Social Networking Sites on their Members: An Analysis of Usage Patterns and Social Mechanisms] [Doctoral dissertation, TU Darmstadt]. TU Prints. [URL]

Wess, J. (2016). Untersuchungen zur Prozessierung von Intermediaten der DNA-Doppelstrangbruchreparatur in der Mitose nach Bestrahlung in der G2-Phase [Investigations on the Processing of Intermediates of DNA Double-Strand Break Repair in Mitosis after Irradiation in the G2-phase] [Doctoral dissertation, TU Darmstadt]. TU Prints. [URL]

Wimsatt, W. C. (2007). Re-Engineering Philosophy for Limited Beeings. Piecewise Approximation to Reality. Harvard University Press.

Cited by (7)

Cited by seven other publications

Order by:

Bender, Michael & Katharina Jacob

2024. Korpushermeneutik. Zeitschrift für Literaturwissenschaft und Linguistik 54:2 ► pp. 145 ff.

Bender, Michael

2023. Pragmalinguistische Annotation und maschinelles Lernen. In Digitale Pragmatik [Digitale Linguistik, 1], ► pp. 267 ff.

Bender, Michael

2024. Korpusgestützte Theoriebildung als hermeneutischer Prozess – iterativ-inkrementelle Entwicklung eines Kategoriensystems am Beispiel einer Theorie des Kommentierens. Zeitschrift für Literaturwissenschaft und Linguistik 54:2 ► pp. 199 ff.

Meißner, Cordula

2023. Indikatormerkmale in metakommentierenden Sprechhandlungen thematisch strukturierter Interaktionen: Eine korpuspragmatische Untersuchung zur Beziehung zwischen Funktion und Form. In Digitale Pragmatik [Digitale Linguistik, 1], ► pp. 237 ff.

Müller, Marcus & Michael Bender

2023. Die dunkle Seite der Ansichtskarte. In Ansichten zur Ansichtskarte [Lettre, ], ► pp. 267 ff.

Müller, Marcus

2022. Discourse Lab – eine Forschungsplattform für die digitale Diskursanalyse. Mitteilungen des Deutschen Germanistenverbandes 69:2 ► pp. 152 ff.

Müller, Marcus

2024. Einsam oder gemeinsam?. Zeitschrift für Literaturwissenschaft und Linguistik 54:2 ► pp. 151 ff.

This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.