Vol. 10:4 (2005) ► pp.517–541
Creating and using Web corpora
The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.
https://doi.org/10.1075/ijcl.10.4.07the
Cited by
Cited by 5 other publications
This list is based on CrossRef data as of 29 february 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.