Article published in:Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian
Edited by Gisle Andersen
[Studies in Corpus Linguistics 49] 2012
► pp. 29–50
Corpuscle – a new corpus management platform for annotated corpora
Corpuscle is a new corpus query engine and Web-based corpus management system. The main design goals were the ability to handle very large corpora, support for structured data (XML), and seamless integration of manual corpus annotation and editing. New algorithms have been developed, among them a technique for running finite state automata from edges with lowest corpus counts, and an implementation of regular expressions on suffix arrays for fast reverse index lookup. These algorithms allow for a clean and elegant implementation of multi-valued and set-valued attributes. The web interface offers rich functionality for concordancing, collocations, distribution statistics, and more. Queries can be input in a graphical, menu-driven way, freeing the user from dealing with the complexities of the query language.
Published online: 23 March 2012
Cited by other publications
Lapponi, Emanuele, Martin G. Søyland, Erik Velldal & Stephan Oepen
This list is based on CrossRef data as of 23 november 2020. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.