Natural Language Processing for Online Applications

Text retrieval, extraction and categorization

HardboundReplaced by new edition
ISBN 9789027249883 (Eur)
ISBN 9781588112491 (USA)
PaperbackReplaced by new edition
ISBN 9789027249890 (Eur)
ISBN 9781588112507 (USA)
Netlibrary e-BookReplaced by new edition
ISBN 9780585462530
This text covers the emerging technologies of document retrieval, information extraction, and text categorization in a way which highlights commonalities in terms of both general principles and practical issues. It seeks to satisfy a need on the part of technology practitioners in the Internet space, faced with having to make difficult decisions as to what research has been done an what the best practices are. It is not intended as a vendor guide (such things are quickly out of date), or as a recipe for building applications (such recipes are very context-dependent). But it does identify the key technologies, the issues involved, and the strengths and weaknesses on evaluation in every chapter, both in terms of methodology (how to evaluate) and what controlled experimentation and industrial experience have to tell us.
“In general, the book is a very good, concise reference book filled with many theoretical principles and practical guidelines. I recommend this book to anyone who wants to build applications related to text retrieval, information extraction and categorization.”
“The authors had the good idea of not making this book a vendor guide but rather an overview of methodologies and technologies available and the evaluation criteria for the techniques described. I do not believe their goal was to publish a detailed overview but an introduction to the various technologies available. In that regard, the book is very successful and I much appreciate it because key concepts are clearly outlined which it makes it easier to follow the authors through the more complex parts of the book. I would recommend it to anyone who is interested in NLP and its applications to the new challenges brought out by the arrival of the information age.”
“In my view, the book is very practical: certainly, since it is pretty comprehensible and does not go into too profound details, it could serve well as a textbook for an introductory course. However, the book is not intended exclusively as an academic text. It is also aimed at software engineers, project managers, and technology executives who want or need to understand the technology at some level. I think that such people may find it useful, and that it may provoke ideas, discussions, and action the field of applied research and development.”
“Some special features of the book include solid coverage of evaluation techniques in every chapter, excellent endnotes, and references to exactly the right stuff. However, the most salient feature of this book is the clear and cogent writing. It reads much like a series of well-written review articles an is actually enjoyable to read while not skimping at all on technical detail.”
Cited by

Cited by 87 other publications

No author info given
2014.  In Biomedical Natural Language Processing [Natural Language Processing, 11], Crossref logo
Amolochitis, Emmanouil, Ioannis T. Christou, Zheng-Hua Tan & Ramjee Prasad
2013. A heuristic hierarchical scheme for academic search and retrieval. Information Processing & Management 49:6  pp. 1326 ff. Crossref logo
Anchieta, Rafael T., Rogerio F. de Sousa & Raimundo S. Moura
2012.  In 2012 XXXVIII Conferencia Latinoamericana En Informatica (CLEI),  pp. 1 ff. Crossref logo
Antunes, Bruno, Nuno Seco & Paulo Gomes
2007.  In Progress in Artificial Intelligence [Lecture Notes in Computer Science, 4874],  pp. 357 ff. Crossref logo
Banerjee, Binayak, Tania Sarkar, Pratap Chakraborty & Alok Ranjan Pal
2017.  In 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT),  pp. 768 ff. Crossref logo
Banville, Debra L.
2006. Mining chemical structural information from the drug literature. Drug Discovery Today 11:1-2  pp. 35 ff. Crossref logo
Behal, Amit, Ying Chen, Cheryl Kieliszewski, Ana Lelescu, Bin He, Jie Cui, Jeffrey Kreulen, James Rhodes & W. Scott Spangler
2007.  In Human Interface and the Management of Information. Interacting in Information Environments [Lecture Notes in Computer Science, 4558],  pp. 834 ff. Crossref logo
Boella, Marco, Francesca Romana Romani, Anjela Al-Raies, Cristina Solimando & Giuliano Lancioni
2011.  In Information Retrieval Technology [Lecture Notes in Computer Science, 7097],  pp. 538 ff. Crossref logo
Byalik, Antuan, Sanchit Chadha & Eli Tilevich
2015.  In Proceedings of the 2015 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences,  pp. 99 ff. Crossref logo
Byalik, Antuan, Sanchit Chadha & Eli Tilevich
2016. Native-2-native: automated cross-platform code synthesis from web-based programming resources. ACM SIGPLAN Notices 51:3  pp. 99 ff. Crossref logo
2007. RAPID PATTERN DEVELOPMENT FOR CONCEPT RECOGNITION SYSTEMS: APPLICATION TO POINT MUTATIONS. Journal of Bioinformatics and Computational Biology 05:06  pp. 1233 ff. Crossref logo
Chadha, Sanchit, Antuan Byalik, Eli Tilevich & Alla Rozovskaya
2017. Facilitating the development of cross-platform software via automated code synthesis from web-based programming resources. Computer Languages, Systems & Structures 48  pp. 3 ff. Crossref logo
Chaudiron, Stéphane
2005. Terminologie, ingénierie linguistique et gestion de l'information. Langages n° 157:1  pp. 25 ff. Crossref logo
Chen, Ying, Scott Spangler, Jeffrey Kreulen, Stephen Boyer, Thomas D. Griffin, Alfredo Alba, Amit Behal, Bin He, Linda Kato, Ana Lelescu, Cheryl Kieliszewski, Xian Wu & Li Zhang
2009.  In 2009 IEEE International Conference on Data Mining Workshops,  pp. 270 ff. Crossref logo
Cohen, Kevin Bretonnel, Benjamin Glass, Hansel M. Greiner, Katherine Holland-Bouley, Shannon Standridge, Ravindra Arya, Robert Faist, Diego Morita, Francesco Mangano, Brian Connolly, Tracy Glauser & John Pestian
2016. Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates through Natural Language Processing and Machine Learning. Biomedical Informatics Insights 8  pp. BII.S38308 ff. Crossref logo
Conrad, Jack G. & Cindy P. Schriber
2006. Managing déjà vu: Collection building for the identification of nonidentical duplicate documents. Journal of the American Society for Information Science and Technology 57:7  pp. 921 ff. Crossref logo
Cosh, Kenneth, Sakgasit Ramingwong, Narissara Eiamkanitchat & Lachana Ramingwong
2018.  In 2018 10th International Conference on Knowledge and Smart Technology (KST),  pp. 106 ff. Crossref logo
Dag, J.N., V. Gervasi, S. Brinkkemper & B. Regnell
2004.  In Proceedings. 12th IEEE International Requirements Engineering Conference, 2004.,  pp. 265 ff. Crossref logo
Dale, R., Li Lei, H. de Vries, M. Gardiner & M. Tilbrook
2005.  In 2005 International Conference on Natural Language Processing and Knowledge Engineering,  pp. 651 ff. Crossref logo
Dale, Robert, Rafael Calvo & Marc Tilbrook
2004.  In AI 2004: Advances in Artificial Intelligence [Lecture Notes in Computer Science, 3339],  pp. 438 ff. Crossref logo
Dale, Robert, Cecile Paris & Marc Tilbrook
2003.  In AI 2003: Advances in Artificial Intelligence [Lecture Notes in Computer Science, 2903],  pp. 150 ff. Crossref logo
John Davies, Grobelnik, Marko & Dunja Mladenić
2005. Automated knowledge discovery in advanced knowledge management. Journal of Knowledge Management 9:5  pp. 132 ff. Crossref logo
Dozier, C. & P. Jackson
2005. Mining Text for Expert Witnesses. IEEE Software 22:3  pp. 94 ff. Crossref logo
Fosdick, Howard
2006. Programming languages for library and textual processing. Bulletin of the American Society for Information Science and Technology 31:6  pp. 21 ff. Crossref logo
Funkner, Anastasia A. & Sergey V. Kovalchuk
2020.  In Computational Science – ICCS 2020 [Lecture Notes in Computer Science, 12140],  pp. 591 ff. Crossref logo
Gallo, Ignazio & Elisabetta Binaghi
2008.  In Progress in Pattern Recognition, Image Analysis and Applications [Lecture Notes in Computer Science, 4756],  pp. 921 ff. Crossref logo
Hartley, James, Eric Sotto & Claire Fox
2004. Clarity Across the Disciplines. Science Communication 26:2  pp. 188 ff. Crossref logo
2011. AUTOMATIC ASSESSMENT OF STUDENTS' FREE-TEXT ANSWERS WITH DIFFERENT LEVELS. International Journal on Artificial Intelligence Tools 20:02  pp. 327 ff. Crossref logo
Hunter, Lawrence & K. Bretonnel Cohen
2006. Biomedical Language Processing: What's Beyond PubMed?. Molecular Cell 21:5  pp. 589 ff. Crossref logo
Ibañez, Marilyn Minicucci, Reinaldo Roberto Rosa & Lamartine Nogueira Frutuoso Guimarães
2022.  In Handbook of Research on Opinion Mining and Text Analytics on Literary Works and Social Media [Advances in Web Technologies and Engineering, ],  pp. 293 ff. Crossref logo
Indu, M & K V Kavitha
2016.  In 2016 International Conference on Research Advances in Integrated Navigation Systems (RAINS),  pp. 1 ff. Crossref logo
Irmak, Utku, Vadim von Brzeski & Reiner Kraft
2009.  In 2009 IEEE 25th International Conference on Data Engineering,  pp. 457 ff. Crossref logo
Jackson, P. & F. Schilder
2006.  In Encyclopedia of Language & Linguistics,  pp. 503 ff. Crossref logo
Jackson, Peter, Khalid Al-Kofahi, Alex Tyrrell & Arun Vachher
2003. Information extraction from case law and retrieval of prior cases. Artificial Intelligence 150:1-2  pp. 239 ff. Crossref logo
Jankowski, Andrzej & Andrzej Skowron
2007.  In Transactions on Rough Sets VI [Lecture Notes in Computer Science, 4374],  pp. 94 ff. Crossref logo
Jo, Taeho & Malrey Lee
2007.  In 5th ACIS International Conference on Software Engineering Research, Management & Applications (SERA 2007),  pp. 289 ff. Crossref logo
Kimbrough, Steven O., Thomas Y. Lee & Ulku Oktem
2012.  In Modeling for Decision Support in Network-Based Services [Lecture Notes in Business Information Processing, 42],  pp. 196 ff. Crossref logo
KyungTae Kim, Sungahn Ko, Niklas Elmqvist & David S Ebert
2011.  In 2011 44th Hawaii International Conference on System Sciences,  pp. 1 ff. Crossref logo
Larson, Martha, Eamonn Newman & Gareth J. F. Jones
2009.  In Evaluating Systems for Multilingual and Multimodal Information Access [Lecture Notes in Computer Science, 5706],  pp. 906 ff. Crossref logo
Leopold, Henrik, Sergey Smirnov & Jan Mendling
2012. On the refactoring of activity labels in business process models. Information Systems 37:5  pp. 443 ff. Crossref logo
Lesmo, Leonardo, Alessandro Mazzei, Monica Palmirani & Daniele P. Radicioni
2013. TULSI: an NLP system for extracting legal modificatory provisions. Artificial Intelligence and Law 21:2  pp. 139 ff. Crossref logo
Li, Simon, Kamrun Nahar & Benjamin C. M. Fung
2015. Product customization of tablet computers based on the information of online reviews by customers. Journal of Intelligent Manufacturing 26:1  pp. 97 ff. Crossref logo
Loutsaris, Michalis Avgerinos & Yannis Charalabidis
2020.  In Proceedings of the 13th International Conference on Theory and Practice of Electronic Governance,  pp. 731 ff. Crossref logo
MacFarlane, Katrinna & Violeta Holmes
2009.  In 2009 International Conference on Management and Service Science,  pp. 1 ff. Crossref logo
Mala, Piotr
2010. ROZWÓJ BADAŃ NAD PRZETWARZANIEM JĘZYKA NATURALNEGO. Zagadnienia Informacji Naukowej - Studia Informacyjne 48:2(96)  pp. 21 ff. Crossref logo
Mikeal, Adam, Cody Green, Alexey Maslov, Scott Phillips & John Leggett
2006.  In 2006 Fourth Latin American Web Congress,  pp. 162 ff. Crossref logo
Mladenić, Dunja & Marko Grobelnik
2003.  In Data Mining and Decision Support,  pp. 15 ff. Crossref logo
Morioka, Nobuyuki & Ashesh Mahidadia
2006.  In Advances in Knowledge Acquisition and Management [Lecture Notes in Computer Science, 4303],  pp. 244 ff. Crossref logo
Nabhan, Rabih Joseph
2017. Stylistic Awareness to Analyze and Comprehend Authentic Discourse in Language Classrooms. Open Journal of Modern Linguistics 07:03  pp. 185 ff. Crossref logo
Nakayama, Minoru, Kouichi Mutsuura & Hiroh Yamamoto
2021.  In Note Taking Activities in E-Learning Environments [Behaviormetrics: Quantitative Approaches to Human Behavior, 11],  pp. 51 ff. Crossref logo
Nakayama, Minoru & Yosiyuki Takahasi
2008. Estimation of certainty for responses to multiple-choice questionnaires using eye movements. ACM Transactions on Multimedia Computing, Communications, and Applications 5:2  pp. 1 ff. Crossref logo
Natarajan, J., D. Berrar, C. J. Hack & W. Dubitzky
2005. Knowledge Discovery in Biology and Biotechnology Texts: A Review of Techniques, Evaluation Strategies, and Applications. Critical Reviews in Biotechnology 25:1-2  pp. 31 ff. Crossref logo
Natt och Dag, Johan & Vincenzo Gervasi
2005.  In Engineering and Managing Software Requirements,  pp. 219 ff. Crossref logo
Nau, Dana S.
2009.  In Springer Handbook of Automation,  pp. 249 ff. Crossref logo
Netisopakul, Ponrudee & Norapan Siriumpunkul
2007.  In Advanced Intelligent Computing Theories and Applications. With Aspects of Contemporary Intelligent Computing Techniques [Communications in Computer and Information Science, 2],  pp. 479 ff. Crossref logo
Norouzzadeh, Mohammad S., Ayoub Bagheri & Mohammad H. Saraee
2009.  In 2009 2nd IEEE International Conference on Computer Science and Information Technology,  pp. 143 ff. Crossref logo
Pablo, Zelinna Cynthia, Nathaniel Oco, Ma. Divina Gracia Roldan, Charibeth Cheng & Rachel Edita Roxas
2014. Toward an enriched understanding of factors influencing Filipino behavior during elections through the analysis of Twitter data. Philippine Political Science Journal 35:2  pp. 203 ff. Crossref logo
Paz-Trillo, Christian, Renata Wassermann & Paula P. Braga
2005. An information retrieval application using ontologies. Journal of the Brazilian Computer Society 11:2  pp. 17 ff. Crossref logo
Perea-Ortega, José M., Arturo Montejo-Ráez, M. Teresa Martín-Valdivia & L. Alfonso Ureña-López
2013. Semantic tagging of video ASR transcripts using the web as a source of knowledge. Computer Standards & Interfaces 35:5  pp. 519 ff. Crossref logo
Perea-Ortega, José M., Arturo Montejo-Ráez, M. Teresa Martín-Valdivia & L. Alfonso Ureña-López
2013. Generating web-based corpora for video transcripts categorization. Expert Systems with Applications 40:1  pp. 337 ff. Crossref logo
Pestian, John P., Pawel Matykiewicz, Michelle Linn-Gust, Brett South, Ozlem Uzuner, Jan Wiebe, K. Bretonnel Cohen, John Hurdle & Christopher Brew
2012. Sentiment Analysis of Suicide Notes: A Shared Task. Biomedical Informatics Insights 5s1  pp. BII.S9042 ff. Crossref logo
Portscher, Edwin, James Geller & Richard Scherl
2003.  In E-Commerce and Web Technologies [Lecture Notes in Computer Science, 2738],  pp. 248 ff. Crossref logo
Radovanović, Miloš & Mirjana Ivanović
2006.  In Advances in Web Intelligence and Data Mining [Studies in Computational Intelligence, 23],  pp. 191 ff. Crossref logo
Rahab, Hichem, Abdelhafid Zitouni & Mahieddine Djoudi
2018.  In Applied Computational Intelligence and Mathematical Methods [Advances in Intelligent Systems and Computing, 662],  pp. 139 ff. Crossref logo
Rasmussen, Steen, Diana Mangalagiu, Hans Ziock, Johan Bollen & Gordon Keating
2007.  In 2007 IEEE Symposium on Artificial Life,  pp. 468 ff. Crossref logo
Rattanyu, Kanlaya & Makoto Mizukawa
2011. Emotion Recognition Based on ECG Signals for Service Robots in the Intelligent Space During Daily Life. Journal of Advanced Computational Intelligence and Intelligent Informatics 15:5  pp. 582 ff. Crossref logo
Rivolli, Adriano, Larissa C. Parker & Andre C. P. L. F. de Carvalho
2017.  In Progress in Artificial Intelligence [Lecture Notes in Computer Science, 10423],  pp. 585 ff. Crossref logo
Rivolli, Adriano, Jesse Read, Carlos Soares, Bernhard Pfahringer & André C. P. L. F. de Carvalho
2020. An empirical analysis of binary transformation strategies and base algorithms for multi-label learning. Machine Learning 109:8  pp. 1509 ff. Crossref logo
Rivolli, Adriano, Carlos Soares & Andre C.P.L.F. de Carvalho
2018.  In 2018 7th Brazilian Conference on Intelligent Systems (BRACIS),  pp. 414 ff. Crossref logo
Rivolli, Adriano, Carlos Soares & André C. P. L. F. de Carvalho
2018. Enhancing multilabel classification for food truck recommendation. Expert Systems 35:4  pp. e12304 ff. Crossref logo
Saric, F., J. Snajder, B.D. Basic & H. Eklic
2005.  In 27th International Conference on Information Technology Interfaces, 2005.,  pp. 214 ff. Crossref logo
Schulze-Kremer, Steffen & Barry Smith
2005.  In Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics, Crossref logo
Soni, Ankit, Nees Jan van Eck & Uzay Kaymak
2007.  In 2007 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making,  pp. 205 ff. Crossref logo
Spangler, Scott, Larry Proctor & Ying Chen
2008.  In 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology,  pp. 258 ff. Crossref logo
Spangler, W. S., J. T. Kreulen, Y. Chen, L. Proctor, A. Alba, A. Lelescu & A. Behal
2010. A smarter process for sensing the information space. IBM Journal of Research and Development 54:4  pp. 1 ff. Crossref logo
Stasko, John, Carsten Gorg, Zhicheng Liu & Kanupriya Singhal
2007.  In 2007 IEEE Symposium on Visual Analytics Science and Technology,  pp. 131 ff. Crossref logo
Stasko, John, Carsten Görg & Zhicheng Liu
2008. Jigsaw: Supporting Investigative Analysis through Interactive Visualization. Information Visualization 7:2  pp. 118 ff. Crossref logo
Tahir, Muhammad Atif, Emdad Khan & Adel Al Salem
2015.  In 2015 2nd World Symposium on Web Applications and Networking (WSWAN),  pp. 1 ff. Crossref logo
van Diggelen, Jurriaan, Robbert-Jan Beun, Frank Dignum, Rogier M. van Eijk & John-Jules Meyer
2006.  In Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems,  pp. 899 ff. Crossref logo
Voloshynovska, Iryna & Nadiya Andreychuk
2007.  In 2007 9th International Conference - The Experience of Designing and Applications of CAD Systems in Microelectronics,  pp. 583 ff. Crossref logo
von Brzeski, Vadim, Utku Irmak & Reiner Kraft
2007.  In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management,  pp. 691 ff. Crossref logo
Wang, Wei, Diep Bich Do & Xuemin Lin
2005.  In Advanced Data Mining and Applications [Lecture Notes in Computer Science, 3584],  pp. 19 ff. Crossref logo
Wang, Xiaoting, Peng Zhu, Giovanni Felici & Evangelos Triantaphyllou
2006.  In Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques [Massive Computing, 6],  pp. 695 ff. Crossref logo
Wnuk, Krzysztof, Martin Höst & Björn Regnell
2012. Replication of an experiment on linguistic tool support for consolidation of requirements from multiple sources. Empirical Software Engineering 17:3  pp. 305 ff. Crossref logo
Wu, Qin, Eddie Fuller & Cun-Quan Zhang
2010.  In Mining and Analyzing Social Networks [Studies in Computational Intelligence, 288],  pp. 1 ff. Crossref logo
Xian-Jun Meng, Qing-Cai Chen, Xiao-Long Wang & Xiao-Hong Yang
2007.  In 2007 IEEE International Conference on Systems, Man and Cybernetics,  pp. 3075 ff. Crossref logo
Xiangzhu Gao, San Murugesan & B. Lo
2004.  In IEEE/WIC/ACM International Conference on Web Intelligence (WI'04),  pp. 192 ff. Crossref logo

This list is based on CrossRef data as of 08 january 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.

Subjects & Metadata
BIC Subject: CF – Linguistics
BISAC Subject: LAN009000 – LANGUAGE ARTS & DISCIPLINES / Linguistics / General
ONIX Metadata
ONIX 2.1
ONIX 3.0
U.S. Library of Congress Control Number:  2002066539 | Marc record