What is anonymisation? This paper addresses this question, its relationship to linguistic data and its potential importance to corpus builders and users. It examines attitudes towards anonymisation such as hostility and disinterest and investigates relevant rights, responsibilities, and obligations. The paper then overviews and critiques methods of anonymisation and seeks to assess which items should be anonymised and which maintained. Finally, some troublesome and noteworthy cases are presented as evidence of the need for sensitive, realistic consideration of this issue. The paper was developed through consultation with researchers from the international community of corpus builders and users and, therefore, reflects the diversity of attittude and practice currently at large. It addresses this variability by finally proposing methods for systematic assessment of the need for anonymisation within individual corpora.
2017. Corpora for Regional and Social Analysis. In Language and a Sense of Place, ► pp. 107 ff.
España-Bonet, Cristina, Alberto Barrón-Cedeño & Lluís Màrquez
2023. Tailoring and evaluating the Wikipedia for in-domain comparable corpora extraction. Knowledge and Information Systems 65:3 ► pp. 1365 ff.
Forsyth, R. S. & S. Sharoff
2014. Document dissimilarity within and across languages: A benchmarking study. Literary and Linguistic Computing 29:1 ► pp. 6 ff.
Garbutt, Joanna
2018. The Use of No Comment by Suspects in Police Interviews. In Exploring Silence and Absence in Discourse, ► pp. 329 ff.
Kageura, Kyo & Takeshi Abekawa
2013. The Place of Comparable Corpora in Providing Terminological Reference Information to Online Translators: A Strategic Framework. In Building and Using Comparable Corpora, ► pp. 285 ff.
Lijffijt, Jefrey, Terttu Nevalainen, Tanja Säily, Panagiotis Papapetrou, Kai Puolamäki & Heikki Mannila
2016. Significance testing of word frequencies in corpora. Digital Scholarship in the Humanities 31:2 ► pp. 374 ff.
Mondada, Lorenza
2014. Ethics in Action: Anonymization as a Participant’s Concern and a Participant’s Practice. Human Studies 37:2 ► pp. 179 ff.
Sahlgren, Magnus & Jussi Karlgren
2005. Counting Lumps in Word Space: Density as a Measure of Corpus Homogeneity. In String Processing and Information Retrieval [Lecture Notes in Computer Science, 3772], ► pp. 151 ff.
Sharoff, Serge
2013. Measuring the Distance Between Comparable Corpora Between Languages. In Building and Using Comparable Corpora, ► pp. 113 ff.
Sharoff, Serge, Reinhard Rapp & Pierre Zweigenbaum
2013. Overviewing Important Aspects of the Last Twenty Years of Research in Comparable Corpora. In Building and Using Comparable Corpora, ► pp. 1 ff.
Sharp, Elizabeth A. & Kelly Munly
2022. Reopening a can of words: Qualitative secondary data analysis. Journal of Family Theory & Review 14:1 ► pp. 44 ff.
Wang, J.
2013. To Divorce or not to Divorce: A Critical Discourse Analysis of Court-ordered Divorce Mediation in China. International Journal of Law, Policy and the Family 27:1 ► pp. 74 ff.
Zeitlyn, David
2012. Anthropology in and of the Archives: Possible Futures and Contingent Pasts. Archives as Anthropological Surrogates. Annual Review of Anthropology 41:1 ► pp. 461 ff.
Zhang, Yanhui
2023. Modelling the lexical complexity of homogenous texts: a time series approach. Quality & Quantity 57:3 ► pp. 2033 ff.
[no author supplied]
2022. QUEST: Guidelines and Specifications for the Assessment of Audiovisual, Annotated Language Data
[Working Papers in Corpus Linguistics and Digital Technologies: Analyses and Methodology, 8],
This list is based on CrossRef data as of 26 november 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.