Natural Language Processing for Online Applications

Text retrieval, extraction and categorization

Second revised edition

| Thomson Corporation
| Thomson Corporation
HardboundAvailable
ISBN 9789027249920 | EUR 105.00 | USD 158.00
 
PaperbackAvailable
ISBN 9789027249937 | EUR 33.00 | USD 49.95
 
e-Book
ISBN 9789027292445 | EUR 105.00/33.00*
| USD 158.00/49.95*
 
This text covers the technologies of document retrieval, information extraction, and text categorization in a way which highlights commonalities in terms of both general principles and practical concerns. It assumes some mathematical background on the part of the reader, but the chapters typically begin with a non-mathematical account of the key issues. Current research topics are covered only to the extent that they are informing current applications; detailed coverage of longer term research and more theoretical treatments should be sought elsewhere. There are many pointers at the ends of the chapters that the reader can follow to explore the literature. However, the book does maintain a strong emphasis on evaluation in every chapter both in terms of methodology and the results of controlled experimentation.

This title replaces Natural Language Processing for Online Applications: Text retrieval, extraction and categorization (2002)

[Natural Language Processing, 5]  2007.  x, 232 pp.
Publishing status: Available
Table of Contents
Preface to the 2nd edition
ix
Preface to the 2nd edition
ix
Chapter 1. Natural language processing
1
Chapter 1. Natural language processing
1
1.1 What is NLP?
2
1.1 What is NLP?
2
1.2 NLP and linguistics
5
1.2 NLP and linguistics
5
1.3 Linguistic tools
11
1.3 Linguistic tools
11
1.4 Plan of the book
20
1.4 Plan of the book
20
Chapter 2. Document retrieval
23
Chapter 2. Document retrieval
23
2.1 Information retrieval
24
2.1 Information retrieval
24
2.2 Indexing technology
25
2.2 Indexing technology
25
2.3 Query processing
27
2.3 Query processing
27
2.4 Evaluating search engines
45
2.4 Evaluating search engines
45
2.5 Attempts to enhance search performance
52
2.5 Attempts to enhance search performance
52
2.6 The future ofWeb searching
59
2.6 The future ofWeb searching
59
Chapter 3. Information extraction
69
Chapter 3. Information extraction
69
3.1 The message understanding conferences
70
3.1 The message understanding conferences
70
3.2 Regular expressions
73
3.2 Regular expressions
73
3.3 Finite automata in FASTUS
75
3.3 Finite automata in FASTUS
75
3.4 Context-free grammars
92
3.4 Context-free grammars
92
3.5 Limitations of current technology and future research
104
3.5 Limitations of current technology and future research
104
3.6 Summary of information extraction
110
3.6 Summary of information extraction
110
Chapter 4. Text categorization
113
Chapter 4. Text categorization
113
4.1 Overview of categorization tasks
115
4.1 Overview of categorization tasks
115
4.2 Handcrafted rule based methods
120
4.2 Handcrafted rule based methods
120
4.3 Inductive learning for text classification
122
4.3 Inductive learning for text classification
122
4.4 Nearest neighbor algorithms
144
4.4 Nearest neighbor algorithms
144
4.5 Combining classifiers
147
4.5 Combining classifiers
147
4.6 Evaluation of text categorization systems
154
4.6 Evaluation of text categorization systems
154
Chapter 5. Text mining
163
Chapter 5. Text mining
163
5.1 What is text mining?
164
5.1 What is text mining?
164
5.2 Resolving reference and coreference
168
5.2 Resolving reference and coreference
168
5.3 Automatic summarization
183
5.3 Automatic summarization
183
5.4 Testing of automatic summarization programs
204
5.4 Testing of automatic summarization programs
204
5.5 Prospects for text mining and NLP
210
5.5 Prospects for text mining and NLP
210
References
215
References
215
Index
227
Index
227
Cited by

Cited by other publications

No author info given
2011.  In Data Mining,  pp. 510 ff. Crossref logo
No author info given
2019.  In Data Mining,  pp. 607 ff. Crossref logo
Aboalnaser, Sara A.
2019.  In 2019 12th International Conference on Developments in eSystems Engineering (DeSE),  pp. 290 ff. Crossref logo
Anzalone, Salvatore M., Yuichiro Yoshikawa, Hiroshi Ishiguro, Emanuele Menegatti, Enrico Pagello & Rosario Sorbello
2012.  In Simulation, Modeling, and Programming for Autonomous Robots [Lecture Notes in Computer Science, 7628],  pp. 4 ff. Crossref logo
Anzalone, Salvatore Maria, Y. Yoshikawa, Hiroshi Ishiguro, Emanuele Menegatti, Enrico Pagello & Rosario Sorbello
2013.  In Intelligent Autonomous Systems 12 [Advances in Intelligent Systems and Computing, 194],  pp. 383 ff. Crossref logo
Arora, Chetan, Mehrdad Sabetzadeh, Shiva Nejati & Lionel Briand
2019. An Active Learning Approach for Improving the Accuracy of Automated Domain Model Extraction. ACM Transactions on Software Engineering and Methodology 28:1  pp. 1 ff. Crossref logo
Ashley, Kevin D. & Stefanie Brüninghaus
2009. Automatically classifying case texts and predicting outcomes. Artificial Intelligence and Law 17:2  pp. 125 ff. Crossref logo
Banchs, Rafael E. & Carlos G. Rodríguez Penagos
2013.  In Emerging Applications of Natural Language Processing,  pp. 230 ff. Crossref logo
Banchs, Rafael E. & Carlos G. Rodríguez Penagos
2013.  In Small and Medium Enterprises,  pp. 1945 ff. Crossref logo
Blackburn, Timothy D., Thomas A. Mazzuchi & Shahram Sarkani
2011. Overcoming Inherent Limits to Pharmaceutical Manufacturing Quality Performance with QbD (Quality by Design). Journal of Pharmaceutical Innovation 6:2  pp. 69 ff. Crossref logo
Bobicev, Victoria, Marina Sokolova, Khaled El Emam, Yasser Jafer, Brian Dewar, Elizabeth Jonker & Stan Matwin
2013. Can Anonymous Posters on Medical Forums be Reidentified?. Journal of Medical Internet Research 15:10  pp. e215 ff. Crossref logo
Bonino, Dario, Alberto Ciaramella & Fulvio Corno
2010. Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Patent Information 32:1  pp. 30 ff. Crossref logo
Cahill, Maria, Soohyung Joo & Kathleen Campana
2018. Language investigations of children's information sources: A research agenda. Proceedings of the Association for Information Science and Technology 55:1  pp. 56 ff. Crossref logo
Cahill, Maria, Soohyung Joo & Kathleen Campana
2020. Analysis of language use in public library storytimes. Journal of Librarianship and Information Science 52:2  pp. 476 ff. Crossref logo
Carvalho, Joao P., Fernando Batista & Luisa Coheur
2012.  In 2012 IEEE International Conference on Fuzzy Systems,  pp. 1 ff. Crossref logo
Chantar, Hamouda, Majdi Mafarja, Hamad Alsawalqah, Ali Asghar Heidari, Ibrahim Aljarah & Hossam Faris
2020. Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification. Neural Computing and Applications 32:16  pp. 12201 ff. Crossref logo
Cheng, Li & Alei Liang
2013.  In Proceedings of 2013 3rd International Conference on Computer Science and Network Technology,  pp. 174 ff. Crossref logo
Chukharev-Hudilainen, Evgeny & Aysel Saricaoglu
2016. Causal discourse analyzer: improving automated feedback on academic ESL writing. Computer Assisted Language Learning 29:3  pp. 494 ff. Crossref logo
Cohen, K. Bretonnel & Lawrence Hunter
2008. Getting Started in Text Mining. PLoS Computational Biology 4:1  pp. e20 ff. Crossref logo
Daniel, Gwendal, Jordi Cabot, Laurent Deruelle & Mustapha Derras
2019.  In Advanced Information Systems Engineering [Lecture Notes in Computer Science, 11483],  pp. 177 ff. Crossref logo
Farrell, Treasa & Nick Rushby
2016. Assessment and learning technologies: An overview. British Journal of Educational Technology 47:1  pp. 106 ff. Crossref logo
Gardoň, Andrej & Aleš Horák
2011.  In Text, Speech and Dialogue [Lecture Notes in Computer Science, 6836],  pp. 323 ff. Crossref logo
Geist, Anton
2009. Using Citation Analysis Techniques for Computer-Assisted Legal Research in Continental Jurisdictions. SSRN Electronic Journal Crossref logo
Gibert, Marcin
2015.  In Computational Collective Intelligence [Lecture Notes in Computer Science, 9330],  pp. 648 ff. Crossref logo
Huijnen, Pim, Fons Laan, Maarten de Rijke & Toine Pieters
2014.  In Social Informatics [Lecture Notes in Computer Science, 8359],  pp. 71 ff. Crossref logo
Itahriouan, Zakaria, Nisserine El Bahri, Samir Brahim Belhaouari, Hajji Tarik & Mohamed Ouazzani Jamil
2021.  In Artificial Intelligence and Industrial Applications [Lecture Notes in Networks and Systems, 144],  pp. 110 ff. Crossref logo
Kang, Jingjing, Tao Liu, He Hu & Xiaoyong Du
2011.  In 2011 Sixth Annual Chinagrid Conference,  pp. 60 ff. Crossref logo
Kejriwal, Mayank, Daniel Gilley, Pedro Szekely & Jill Crisman
2018.  In Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18,  pp. 147 ff. Crossref logo
Krallinger, Martin, Obdulia Rabal, Anália Lourenço, Julen Oyarzabal & Alfonso Valencia
2017. Information Retrieval and Text Mining Technologies for Chemistry. Chemical Reviews 117:12  pp. 7673 ff. Crossref logo
Kucuk, Dilek & Adnan Yazici
2008.  In 2008 23rd International Symposium on Computer and Information Sciences,  pp. 1 ff. Crossref logo
Kusumadewi, Sri, Chanifah Indah Ratnasari & Linda Rosita
2015.  In 2015 International Conference on Science and Technology (TICST),  pp. 292 ff. Crossref logo
Küçük, Dilek & Adnan Yazıcı
2011. Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos. Knowledge-Based Systems 24:6  pp. 844 ff. Crossref logo
Lai, Kaitao, Natalie Twine, Aidan O’Brien, Yi Guo & Denis Bauer
2019.  In Encyclopedia of Bioinformatics and Computational Biology,  pp. 272 ff. Crossref logo
Liszka, Kathy J., Chien-Chung Chan & Chandra Shekar
2012.  In Social Network Mining, Analysis, and Research Trends,  pp. 101 ff. Crossref logo
Liszka, Kathy J., Chien-Chung Chan & Chandra Shekar
2013.  In Data Mining,  pp. 1407 ff. Crossref logo
More, Joaquim, David Baneres, Jordi Conesa & Montse Junyent
2014.  In 2014 International Conference on Intelligent Networking and Collaborative Systems,  pp. 480 ff. Crossref logo
Oleshchuk, Vladimir & Vitaly Klyuev
2009.  In 2009 IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications,  pp. 561 ff. Crossref logo
O’Shea, James, Zuhair Bandar & Keeley Crockett
2011.  In Intelligence-Based Systems Engineering [Intelligent Systems Reference Library, 10],  pp. 201 ff. Crossref logo
Pérez-Soler, Sara, Gwendal Daniel, Jordi Cabot, Esther Guerra & Juan de Lara
2020.  In Enterprise, Business-Process and Information Systems Modeling [Lecture Notes in Business Information Processing, 387],  pp. 257 ff. Crossref logo
Rebelo, Francisco, Carlos Soares & Rosaldo J. F. Rossetti
2015.  In 2015 IEEE First International Smart Cities Conference (ISC2),  pp. 1 ff. Crossref logo
Santos, Carolina Leana, Paulo Rita & João Guerreiro
2018. Improving international attractiveness of higher education institutions based on text mining and sentiment analysis. International Journal of Educational Management 32:3  pp. 431 ff. Crossref logo
Scheurwegs, Elyne, Kim Luyckx, Léon Luyten, Walter Daelemans & Tim Van den Bulcke
2016. Data integration of structured and unstructured sources for assigning clinical codes to patient stays. Journal of the American Medical Informatics Association 23:e1  pp. e11 ff. Crossref logo
Seki, Kazuhiro & Javed Mostafa
2008. Gene ontology annotation as text categorization: An empirical study. Information Processing & Management 44:5  pp. 1754 ff. Crossref logo
Shin, Teo Yon, Yuan Zihong, Ng Wee Siong, Zhang Yangfan & Valerie Phangt
2017.  In 2017 International Conference on Asian Language Processing (IALP),  pp. 99 ff. Crossref logo
Stanković, Ranka, Cvetana Krstev, Ivan Obradović & Olivera Kitanović
2015.  In Semantic Keyword-based Search on Structured Data Sources [Lecture Notes in Computer Science, 9398],  pp. 167 ff. Crossref logo
Stanković, Ranka, Cvetana Krstev, Ivan Obradović & Olivera Kitanović
2017.  In Transactions on Computational Collective Intelligence XXVI [Lecture Notes in Computer Science, 10190],  pp. 162 ff. Crossref logo
Sulieman, Lina, David Gilmore, Christi French, Robert M. Cronin, Gretchen Purcell Jackson, Matthew Russell & Daniel Fabbri
2017. Classifying patient portal messages using Convolutional Neural Networks. Journal of Biomedical Informatics 74  pp. 59 ff. Crossref logo
Sánchez-Cervantes, José Luis, Giner Alor-Hernández, Mario Andrés Paredes-Valverde, Lisbeth Rodríguez-Mazahua & Rafael Valencia-García
2020. NaLa-Search: A multimodal, interaction-based architecture for faceted search on linked open data. Journal of Information Science  pp. 016555152093091 ff. Crossref logo
Takemiya, Makoto, Kei Majima, Mitsuaki Tsukamoto & Yukiyasu Kamitani
2016. BrainLiner: A Neuroinformatics Platform for Sharing Time-Aligned Brain-Behavior Data. Frontiers in Neuroinformatics 10 Crossref logo
Thessen, Anne E., Cynthia Sims Parr & Luis M. Rocha
2014. Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life. PLoS ONE 9:3  pp. e89550 ff. Crossref logo
Tomašev, Nenad
2017. Extracting the patterns of truthfulness from political information systems in Serbia. Information Systems Frontiers 19:1  pp. 109 ff. Crossref logo
Vollero, Agostino, Alfonso Siano & Domenico Sardanelli
2020.  In Advances in Digital Marketing and eCommerce [Springer Proceedings in Business and Economics, ],  pp. 188 ff. Crossref logo
Yoon, Sunmoo, Noémie Elhadad & Suzanne Bakken
2013. A Practical Approach for Content Mining of Tweets. American Journal of Preventive Medicine 45:1  pp. 122 ff. Crossref logo
Zhang, Lishan & Kurt VanLehn
2017. Adaptively selecting biology questions generated from a semantic network. Interactive Learning Environments 25:7  pp. 828 ff. Crossref logo
Zhao, Qianqian, Kai Chen, Tongxin Li, Yi Yang & XiaoFeng Wang
2018. Detecting telecommunication fraud by understanding the contents of a call. Cybersecurity 1:1 Crossref logo

This list is based on CrossRef data as of 05 august 2020. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.

Erratum

Erratum

For errata please go to http://www.jacksonpeter.com/nlp

Subjects
BIC Subject: UYQL – Natural language & machine translation
BISAC Subject: COM042000 – COMPUTERS / Natural Language Processing
U.S. Library of Congress Control Number:  2007010559