Chapter 10
Query a corpus in near-natural language
A human-friendly corpus query language not only for linguists
This paper addresses the pressing issue of accessibility of corpora to users who are not able or willing to
learn a formal query language. It introduces a working online automatic translator from a near-natural language into
the Corpus Query Language (CQL), as used in SketchEngine, Czech National Corpus web
applications, and other services. The translator does not require strict syntactical patterns and allows for a certain
amount of typing errors, using the redundancy associated with natural language. It allows querying corpora of 35
languages hosted by the Czech National Corpus infrastructure, all of them annotated in the Universal
Dependencies formalism. Alternatively, the translated CQL code can be employed in other compatible systems. The paper
both presents the theoretical assumptions of our solution and outlines the details of its implementation, including
examples of use.
Article outline
- 1.Introduction
- 2.Methodology
- 3.Example queries
- 4.Examples of use outside linguistics
- 5.Testing
- 6.Conclusion
-
Notes
-
References