Policy and Practice in the Anonymisation of Linguistic Data

Rock, Frances

doi:10.1075/ijcl.6.1.01roc

Article published In:

International Journal of Corpus Linguistics
Vol. 6:1 (2001) ► pp.1–26

Policy and Practice in the Anonymisation of Linguistic Data

Frances Rock | The University of Birmingham

What is anonymisation? This paper addresses this question, its relationship to linguistic data and its potential importance to corpus builders and users. It examines attitudes towards anonymisation such as hostility and disinterest and investigates relevant rights, responsibilities, and obligations. The paper then overviews and critiques methods of anonymisation and seeks to assess which items should be anonymised and which maintained. Finally, some troublesome and noteworthy cases are presented as evidence of the need for sensitive, realistic consideration of this issue. The paper was developed through consultation with researchers from the international community of corpus builders and users and, therefore, reflects the diversity of attittude and practice currently at large. It addresses this variability by finally proposing methods for systematic assessment of the need for anonymisation within individual corpora.

Keywords: anonymisation, confidentiality, privacy, corpus building

Published online: 17 December 2001

https://doi.org/10.1075/ijcl.6.1.01roc

Cited by (15)

Cited by 15 other publications

Order by:

España-Bonet, Cristina, Alberto Barrón-Cedeño & Lluís Màrquez

2023. Tailoring and evaluating the Wikipedia for in-domain comparable corpora extraction. Knowledge and Information Systems 65:3 ► pp. 1365 ff.

Zhang, Yanhui

2023. Modelling the lexical complexity of homogenous texts: a time series approach. Quality & Quantity 57:3 ► pp. 2033 ff.

Sharp, Elizabeth A. & Kelly Munly

2022. Reopening a can of words: Qualitative secondary data analysis. Journal of Family Theory & Review 14:1 ► pp. 44 ff.

Garbutt, Joanna

2018. The Use of No Comment by Suspects in Police Interviews. In Exploring Silence and Absence in Discourse, ► pp. 329 ff.

Corrigan, Karen P.

2017. Corpora for Regional and Social Analysis. In Language and a Sense of Place, ► pp. 107 ff.

Lijffijt, Jefrey, Terttu Nevalainen, Tanja Säily, Panagiotis Papapetrou, Kai Puolamäki & Heikki Mannila

2016. Significance testing of word frequencies in corpora. Digital Scholarship in the Humanities 31:2 ► pp. 374 ff.

Forsyth, R. S. & S. Sharoff

2014. Document dissimilarity within and across languages: A benchmarking study. Literary and Linguistic Computing 29:1 ► pp. 6 ff.

Mondada, Lorenza

2014. Ethics in Action: Anonymization as a Participant’s Concern and a Participant’s Practice. Human Studies 37:2 ► pp. 179 ff.

Kageura, Kyo & Takeshi Abekawa

2013. The Place of Comparable Corpora in Providing Terminological Reference Information to Online Translators: A Strategic Framework. In Building and Using Comparable Corpora, ► pp. 285 ff.

Sharoff, Serge

2013. Measuring the Distance Between Comparable Corpora Between Languages. In Building and Using Comparable Corpora, ► pp. 113 ff.

Sharoff, Serge, Reinhard Rapp & Pierre Zweigenbaum

2013. Overviewing Important Aspects of the Last Twenty Years of Research in Comparable Corpora. In Building and Using Comparable Corpora, ► pp. 1 ff.

Wang, J.

2013. To Divorce or not to Divorce: A Critical Discourse Analysis of Court-ordered Divorce Mediation in China. International Journal of Law, Policy and the Family 27:1 ► pp. 74 ff.

Zeitlyn, David

2012. Anthropology in and of the Archives: Possible Futures and Contingent Pasts. Archives as Anthropological Surrogates. Annual Review of Anthropology 41:1 ► pp. 461 ff.

Sahlgren, Magnus & Jussi Karlgren

2005. Counting Lumps in Word Space: Density as a Measure of Corpus Homogeneity. In String Processing and Information Retrieval [Lecture Notes in Computer Science, 3772], ► pp. 151 ff.

[no author supplied]

2022. QUEST: Guidelines and Specifications for the Assessment of Audiovisual, Annotated Language Data [Working Papers in Corpus Linguistics and Digital Technologies: Analyses and Methodology, 8],

This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.