Anonymising a French SMS corpus using natural language processing techniques: Seek&Hide

Accorsi, Pierre; Patel, Namrata; Lopez, Cédric; Panckhurst, Rachel; Roche, Mathieu

doi:10.1075/bct.61.03acc

Part of

SMS Communication: A linguistic approach
Edited by Louise-Amélie Cougnon and Cédrick Fairon
[Benjamins Current Topics 61] 2014
► pp. 11–28

Seek&Hide

Anonymising a French SMS corpus using natural language processing techniques

Pierre Accorsi | Université Montpellier 2, LIRMM, CNRS

Namrata Patel | Objet Direct — VISEO

Cédric Lopez | Praxiling UMR 5267 CNRS & Université Paul-Valéry Montpellier 3

Rachel Panckhurst | TETIS, Cirad, Irstea, AgroParisTech and LIRMM, CNRS, Université Montpellier 2

Mathieu Roche

This article presents the system Seek&Hide, a text message processing tool developed for the sud4science LR (http://www.sud4science.org/) project. It performs the anonymisation/de-identification of a corpus. At present, it has been used to anonymise the sud4science LR corpus of French text messages collected during the project. This is done in two phases. In the first phase, it automatically processes over 70% of the corpus. The rest of the corpus is processed in the second phase, aided by an expert annotator via a web interface specifically designed to simplify the task.

Published online: 8 July 2014

https://doi.org/10.1075/bct.61.03acc

Cited by (5)

Cited by five other publications

Order by:

Wendy Ayres-Bennett & Mairi McLaughlin

2024. The Oxford Handbook of the French Language,

Panckhurst, Rachel, Cédric Lopez & Mathieu Roche

2020. A French text-message corpus: 88milSMS. Synthesis and usage. Corpus :20

McSweeney, Michelle A.

2018. We Don’t Speak Phone. In The Pragmatics of Text Messaging, ► pp. 149 ff.

Panckhurst, Rachel

2016. A digital corpus resource of authentic anonymized French text messages: 88milSMS—What about transcoding and linguistic annotation?. Digital Scholarship in the Humanities ► pp. fqw049 ff.

Panckhurst, Rachel, Mathieu Roche, Cédric Lopez, Bertrand Verine, Catherine Détrie & Claudine Moïse

2016. De la collecte à l’analyse d’un corpus de SMS authentiques : une démarche pluridisciplinaire. Histoire Epistémologie Langage 38:2 ► pp. 73 ff.

This list is based on CrossRef data as of 26 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.