Edited by Minna Korhonen, Haidee Kotze and Jukka Tyrkkö
[Studies in Corpus Linguistics 111] 2023
► pp. 54–88
This chapter addresses the question of editorial practice for the Australian Hansard with the use of an aligned corpus of transcribed audio recordings and the corresponding Hansard records, covering the period 1946–2015. A more traditional, qualitative, bottom-up approach is taken by manually analysing the data to compile a list of differences in the two types of records. In addition, a deductive, quantitative approach is adopted by using the multidimensional analysis method of Biber (1988) to identify significant differences in the frequencies of (clusters of) features between the oral transcripts and written Hansard records and interpret these. Our primary aim is to provide insight into methodological questions associated with working with big linguistic data. Alongside this, we report findings about differences between the written Hansard and the original speeches: reduction of spoken language processing features and informality, greater conservatism, and more density – although these differences decrease over time.