Seek&Hide
Anonymising a French SMS corpus using natural language processing techniques
Cédric Lopez | Praxiling UMR 5267 CNRS & Université Paul-Valéry Montpellier 3
This article presents the system Seek&Hide, a text message processing tool developed for the sud4science LR (http://www.sud4science.org/) project. It performs the anonymisation/de-identification of a corpus. At present, it has been used to anonymise the sud4science LR corpus of French text messages collected during the project. This is done in two phases. In the first phase, it automatically processes over 70% of the corpus. The rest of the corpus is processed in the second phase, aided by an expert annotator via a web interface specifically designed to simplify the task.
Cited by (3)
Cited by three other publications
Efraim, Octavia, Vladislav Maraev & João Rodrigues
2018.
Boosting a Rule-Based Chatbot Using Statistics and User Satisfaction Ratings. In
Artificial Intelligence and Natural Language [
Communications in Computer and Information Science, 789],
► pp. 27 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
Panckhurst, Rachel
2013.
A Large SMS Corpus in French: From Design and Collation to Anonymisation, Transcoding and Analysis.
Procedia - Social and Behavioral Sciences 95
► pp. 96 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
Patel, Namrata, Pierre Accorsi, Diana Inkpen, Cédric Lopez & Mathieu Roche
2013.
Approaches of Anonymisation of an SMS Corpus. In
Computational Linguistics and Intelligent Text Processing [
Lecture Notes in Computer Science, 7816],
► pp. 77 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.