The (very) long history of corpora, concordances, collocations and all that
In the development of academic disciplines, important ideas are often proposed, forgotten, and then rediscovered much later, when they are connected to other ideas in a way which reveals their significance. I give examples of ideas which are often thought of as quite modern, although they have a very long history:
-
using corpora in constructing dictionaries and language teaching materials
-
using concordances as data for textual exegesis and information retrieval
-
using collocations as evidence of word meaning.
In all three cases the theoretical significance of the ideas became clear only after improved techniques of visualisation allowed patterns to be seen in complex non-numerical data.
Article outline
- 1.Overview
- 2.Previous work
- 3.Concordancing content
- 4.Index, verbal concordance and “real” concordance
- 5.Concordancing form
- 6.Meaning and use
- 7.Practice and theory
- 8.Digital corpora
- 9.Collocation and phraseology
- 10.Meaningful quantification
- 11.KWIC (Key word in context) concordances
- 12.Concordance packages and programming languages
- 13.Conclusion
-
Acknowledgements
-
Notes
-
References
References
Allén, S.
et al. 1975 Nusvensk frekvensordbok baserad på tidningstext: Frequency dictionary of present-day Swedish, based on newspaper material. Vol 3. Ordfoerbindelser. Collocations. Stockholm: Almqvist & Wiksell.

Austin, J. L.
1962 How to Do Things with Words. Oxford: Clarendon Press.

Ayscough, S.
1790 An Index to the Remarkable Passages and Words Made Use of by Shakespeare; Calculated to Point out the Different Meanings to which the Words are Applied. London: Stockdale.

Bally, C.
1909 Traité de stylistique française. Heidelberg: C. Winter.

Barnbrook, G., Mason, O. & Krishnamurthy, R.
2013 Collocation: Applications and Implications. Houndmills: Palgrave Macmillan.


Berry-Rogghe, G. L. M. & Crawford, T. D.
1973 COCOA: A Word Count and Concordance Generator. Chilton: Atlas Computer Laboratory.

Brewer, C.
2011 Examining the OED.
[URL] (17 September 2018).
Burnard, L.
1978–1979 SNOBOL: The language for literary computing.
ALLC Journal 6–7.

Busa, R.
1974 Index Thomisticus: Sancti Thomae Aquinatis operum omnium indices et concordantiae … Stuttgart-Bad Cannstatt: Frommann-Holzboog.

Busa, R.
1976 Guest editorial: Why can a computer do so little? Bulletin of the Association for Literary and Linguistic Computing 4(1): 1–3.

Busa, R.
1992 Half a century of literary computing: Towards a “New” philology.
Historical Social Research/Historische Sozialforschung 17, 2(62): 124–33.

Busa, R.
(ed.) 1992 Thomae Aquinatis Opera Omnia cum Hypertextibus in CD-ROM. Milano: Editoria Elettronica Editel.

Busa, R.
2004 Foreword: perspectives in the digital humanities. In
A Companion to Digital Humanities,
S. Schreibman,
R. G. Siemens &
J. Unsworth (eds), xvi–xxi. Oxford: Blackwell.
[URL] (10 September 2018).
Butler, C. S.
1985 Computers in Linguistics. Oxford: Blackwell.

Crestadoro, A.
1856 The Art of Making Catalogues of Libraries: Or, a Method to Obtain in a Short Time a Most Perfect, Complete, and Satisfactory Printed Catalog of the British Museum Library by a Reader Therein. London: Literary, Scientific & Artistic Reference Office.
[URL] (1 November 2013).
Crestadoro, A.
1864 Catalogue of the Books in the Manchester Free Library. Manchester Public Libraries (Manchester, England). London: Sampson, Low, Son, & Marston.
[URL] (10 September 2018).
Crowley, T.
1989 The Politics of Discourse: The Standard Language Question in British Cultural Debates. Houndmills: Macmillan.


Cruden, A.
1737 A Complete Concordance to the Holy Scriptures of the Old and New Testament; Or a Dictionary and Alphabetical Index to the Bible … London: Frederick Warne & Co.

Cruden, A.
1741 A Verbal Index to Milton’s Paradise Lost. Adapted to Every Edition but the First, Which was Publish’d in Ten Books Only. London: W. Innys & D. Browne.

Eusebius
c. 320? Epistula ad Carpianum ad canones evangeliorum praemissa. Greek with English translation.
[URL] (17 September 2018).

Firth, J. R.
1957 A synopsis of linguistic theory, 1930-1955.
Studies in Linguistic Analysis, 1-32. Philological Society.

Fischer, M.
1966 The KWIC index concept: A retrospective view.
American Documentation,
April, 57–70.


Francis, W. N.
1992 Language corpora BC. In
Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm
4–8 August 1991,
J. Svartvik (ed.). Berlin: De Gruyter.


Francis, W. N. & Kučera, H.
1964 Brown Corpus Manual. Manual of Information to Accompany a Standard Corpus of Present-Day Edited American English, for Use with Digital Computers. Providence RI: Brown University.
[URL] (17 September 2017).
Fraser, M.
1996 A Hypertextual History of Humanities Computing: The Pioneers.
[URL] (17 September 2018).
Fries, C. C.
1952 The Structure of English. New York NY: Harcourt Brace.

Fries, P. H.
2012 Charles C. Fries, linguistics and corpus linguistics.
ICAME Journal 34: 89–119.

Giegerich, H.
2005 Obituary for Angus McIntosh (1914–2005).
[URL] (17 September 2018).
Gougenheim, G., Michea, R., Rivenc, P. & Sauvageot, A.
1956 L’élaboration du français élémentaire: Étude sur l’établissement d’un vocabulaire et d’une grammaire de base. Paris: Didier. (Revised ed. retitled
L’élaboration du français fondamental, 1964, Paris: Didier).

Griswold, R. E., Poage, J. F. & Polonsky, I. P.
1968 The SNOBOL4 Programming Language. Englewood Cliffs NJ: Prentice Hall.

Hanks, P.
2013 Lexical Analysis: Norms and Exploitations. Cambridge MA: The MIT Press.


Hockey, S. M.
1987 Micro-OCP (Oxford Concordance Program). Oxford: OUP.

Hockey, S. M.
1985 SNOBOL Programming for the Humanities. Oxford: Clarendon Press.

Hockey, S. M.
2004 The history of humanities computing. In
A Companion to Digital Humanities,
S. Schreibman,
R. G. Siemens &
J. Unsworth (eds), Oxford: Blackwell.
[URL] (17 September 2018).

Hollerith, H.
1894 The electrical tabulating machine.
Journal of the Royal Statistical Society 57 (4): 678–89.


Hollerith, H.
1895 Hollerith’s electric tabulating machine.
Railroad Gazette 19 April 1895.

Hollerith, H.
1898 Art of compiling statistics. No. 395,782. United States Patent Office.
[URL] (1 November 2013).
Howatt, A. P. R.
2004 A History of English Language Teaching, 2nd ed. with H.G. Widdowson. Oxford: OUP.

Hüllen, W.
1996 Schemata der Historiographie. Ein Traktat.
Beiträge zur Geschichte der Sprachwissenschaft 6(1): 113–125. Also in
M. Isermann (ed.) 2002
Werner Hüllen: Collected Papers on the History of Linguistic Ideas, 16–28. Münster: Nodus.

Jespersen, O.
1909[1949] A Modern English Grammar on Historical Principles. Heidelberg: C. Winter.

Johansson, S.
2008 Some aspects of the development of corpus linguistics in the 1970s and 1980s. In
Corpus Linguistics: An International Handbook, Vol. 1,
A. Lüdeling &
M. Kytö (eds), 33–52. Berlin: De Gruyter.

Johnson, S.
1747 The plan of an English Dictionary,
J. Lynch (ed.).
[URL] (17 September 2018).
Johnson, S.
1755 A Dictionary of the English Language: In Which the Words are Deduced from Their Originals, and Illustrated in Their Different Significations by Examples from the Best Writers … London: Knapton.

Kaeding, F. W.
1897 Häufigkeitswörterbuch der deutschen Sprache: Festgestellt durch einen Arbeitsausschuss der deutschen Stenographie-Systeme. Steglitz bei Berlin.

Kennedy, G.
1998 An Introduction to Corpus Linguistics. London: Longman.

Keay, J.
2004 Alexander the Corrector: The Tormented Genius who Unwrote the Bible. London: HarperCollins.

Kuhn, T. S.
1969 Comment on the relations of science and art.
Comparative Studies in Society and History 11: 403–412.


Leech, G.
1991 The state of the art in corpus linguistics. In
English Corpus Linguistics,
K. Aijmer &
B. Altenberg (eds), 8–30. London: Longman.

Leech, G.
2013 The development of ICAME and the Brown family of corpora. In
The Many Facets of Corpus Linguistics in Bergen: In Honour of Knut Hofland [
Bergen Language and Linguistics Studies 3. 1],
L. Hareide,
C. Johansson &
M. Oakes (eds).
[URL] (17 September 2018).

Léon, J.
2005 Claimed and unclaimed sources of corpus linguistics.
Henry Sweet Society Bulletin 44: 36–50.

Liberman, M.
2004 A brief and a compendious table.
Language Log, 4.
March 4 2004 <
[URL] (17 September 2018).
Losee, R. M.
2001 Term dependence: A basis for Luhn and Zipf models.
Journal of the American Society for Information Science and Technology 52(12): 1019–1025.


Luhn, H. P.
1958 The automatic creation of literature abstracts.
IBM Journal of Research and Development 2(2): 159–165.


Luhn, H. P.
1960 Keyword-in-context index for technical literature.
American Documentation xi(4): 288–95.


McIntosh, A.
1961 Patterns and ranges.
Language 37: 325–337.


Meehan, B.
1994 The Book of Kells: An Illustrated Introduction to the Manuscript in Trinity College, Dublin. London: Thames & Hudson.

Meyer, C. F.
2008 Pre-electronic corpora. In
Corpus Linguistics: An International Handbook, Vol. 1,
A. Lüdeling &
M. Kytö (eds), 1–13. Berlin: De Gruyter.

Mugglestone, L. C.
2005 Lost for Words: The Hidden History of the Oxford English Dictionary. New Haven CT: Yale University Press.

Murray, K. M. E.
1977 Caught in the Web of Words: James A H Murray and the Oxford English Dictionary. New Haven CT: Yale University Press.

Oliver, H. H.
1959 The epistle of Eusebius to Carpianus: Textual tradition and translation.
Novum Testamentum 3(1–2): 138–145.

Palmer, H. E.
1933 Second Interim Report on English Collocations (submitted to the Tenth Annual Conference of English Teachers, Institute for Research in English Teaching, Dept. of Education, Tokyo). Tokyo: Kaitakusha.

Porzig, W.
1934 Wesenhafte Bedeutungsbeziehungen.
Beiträge zur Geschichte der deutschen Sprache und Literatur 58: 70–97.

Ramsay, S.
2008 Algorithmic criticism. In
A Companion to Digital Humanities,
S. Schreibman,
R. G. Siemens &
J. Unsworth (eds). Oxford: Blackwell.
[URL] (17 September 2018).
Rastall, P.
2001 Richard Chevenix Trench: More than just a populariser.
The Henry Sweet Society Bulletin 37: 22–39.

Reed, A.
1977 CLOC: A collocation package.
Association for Literary and Linguistic Computing Bulletin 5(2): 168–173.

Reed, A.
1978 CLOC User Guide. Birmingham: University of Birmingham, Computer Centre.

Renouf, A.
2007 Corpus development 25 years on: From super-corpus to cyber-corpus. In
Corpus Linguistics 25 Years On,
R. Facchinetti (ed.), 127–149. Amsterdam: Rodopi.


Renouf, A. & Sinclair, J.
1991 Collocational frameworks in English. In
English Corpus Linguistics,
K. Aijmer &
B. Altenberg (eds), 128–43. London: Longman.

Scott, M. & Tribble, C.
2006 Textual Patterns [
Studies in Corpus Linguistics 22]. Amsterdam: John Benjamins.


Sellar, W. C. & Yeatman, R. J.
1931 1066 and All That; A Memorable History of England, Comprising All the Parts You Can Remember Including One Hundred and Three Good Things, Five Bad Kings and Two Genuine Dates. London: Methuen.

Sinclair, J. McH.
1991 Corpus, Concordance, Collocation. Oxford: OUP.

Sinclair, J. McH.
2004 Interview with John Sinclair conducted by Wolfgang Teubert. In
The OSTI Report,
R. Krishnamurty (ed.), xvii–xxix. London: Continuum.

Sinclair, J. McH.
(ed.) 1987 Looking Up. An Account of the COBUILD Project in Lexical Computing and the Development of the Collins COBUILD English Language Dictionary. London: Collins ELT.

Sinclair, J. McH., Jones, S. & Daley, R.
1970[2004] English Collocation Studies. Original mimeoed report 1970. Re-published as
Krishnamurthy, R. (ed.) 2004
English Collocation Studies: The OSTI Report. London: Continuum.

Soy, S. K.
1998 Class notes: H. P. Luhn and automatic indexing.
[URL] (17 September 2018).
Sperberg-McQueen, C. M. & Burnard, L.
(eds) 1990 Guidelines for the Encoding and Interchange of Machine-Readable Texts. TEI P1. Draft 1.1. Chicago-Oxford. Updates:
[URL] (17 September 2018)
Stevens, M. E.
1965 Automatic Indexing: A State of the Art Report.
[URL] (17 September 2018).

Svartvik, J.
2007 Corpus linguistics 25+ years on. In
Corpus Linguistics 25 Years On,
R. Facchinetti (ed.), 11–25. Amsterdam: Rodopi.


Teubert, W.
2004 A brief history of corpus linguistics. In
Lexicology and Corpus Linguistics,
M. A. K. Halliday,
W. Teubert,
C. Yallop &
A. Čermáková, 107–112. London: Continuum.

The Oxford English Dictionary
1933 James A. H. Murray (ed.). Oxford: Clarendon Press.

Thorndike, E. L. & Irving, Lorge
1944 The Teacher’s Word Book of 30,000 Words. New York NY: Teachers College, Columbia University.

Trench, R. C.
1857 On Some Deficiencies in our English Dictionaries: Being the Substance of Two Papers Read Before the Philological Society, Nov. 5, and Nov. 19, 1857. Philological Society (Great Britain). London: J.W. Parker & Son.

Voloshinov, V. N.
1929[1973] Marxism and the Philosophy of Language, transl. by
L. Matejka &
I. R. Titunik, first published in Russian 1929. New York NY: Seminar Press.

Weaver, W.
1955 Translation. In
Machine Translation of Languages,
W. N. Locke &
D. A. Booth (eds), 15–23. Cambridge MA: The MIT Press 1949 version at:
[URL] (17 September 2018).
West, M.
1953 A General Service List of English Words. London: Longman, Green & Co.

Winchester, S.
1999 The Surgeon of Crowthorne: A Tale of Murder, Madness and the Oxford English Dictionary. London: Penguin.

Winchester, S.
2003 The Meaning of Everything: The Story of the Oxford English Dictionary. Oxford: OUP.

Wisbey, R. A.
(ed.) 1971 The Computer in Literary and Linguistic Research. Cambridge: CUP.

Wittgenstein, L.
1953 Philosophische Untersuchungen. Frankfurt: Suhrkamp.

Zipf, G. K.
1949 Human Behavior and the Principle of Least Effort. Reading MA: Addison-Wesley.

Cited by
Cited by 1 other publications
Statham, Simon & Rocío Montoro
2019.
The year’s work in stylistics 2018.
Language and Literature: International Journal of Stylistics 28:4
► pp. 354 ff.

This list is based on CrossRef data as of 9 may 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.