Register-based humour consists of texts in which most of the language is in a particular style or tone, except for one or two words which are radically different in tone (or register) from the rest. It is not initially clear how to define register formally in terms of constructs, such as literariness, archaism, formality, etc. We have adopted a perspective in which words are located in a multi-dimensional space, and incongruity between words should correspond to a relatively large distance between those words, within this space. In order to construct this space in a way which shows up differences relevant to the question of register, we have based each dimension on a word’s frequency of occurrence in a particular corpus of texts. We have put together a number of corpora between which there are likely to be differences of tone/register, and for each word in a text we compute its frequency within every corpus. These numbers are then used to plot the word’s position in our abstract space. The most successful technique, both for building the space and for computing outliers, was tested on the task of distinguishing humorous texts from plain newspaper sentences, where it performed quite well.
This list is based on CrossRef data as of 27 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.