This paper reviews the gap between current methods of text visualization and the needs of corpus-linguistic research, and introduces a tool that takes a step towards bridging that gap. Current text visualization methods tend to treat the problem as a data-encoding issue only, and do not strive for interactive, tightly coupled representations of text that would foster discovery. The paper argues that such visualizations should always be linked for effortless movement between the text and its visualization, and that the visualization controls should provide continuous and immediate feedback to facilitate exploration. We introduce a tool, Text Variation Explorer (TVE), to demonstrate the aforementioned requirements. TVE allows visual and interactive examining of the behaviour of linguistic parameters affected by text window size and overlap, and in addition, performs interactive principal component analysis based on a user-given set of words.
Binongo, J.N.G. 2003. “Who wrote the 15th Book of Oz? An application of multivariate analysis to authorship attribution”. Chance, 16 (2), 9–17.
Brown Corpus. 1964, 1971, 1979. A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown).. Compiled by W.N. Francis & H. Kučera. Providence, RI: Brown University.
Card, S.K., Mackinlay, J.D. & Shneiderman, B. 1999. Readings in Information Visualization: Using Vision to Think. San Francisco, CA: Morgan Kaufmann.
Craft, B. & Cairns, P. 2005. “Beyond guidelines: What can we learn from the Visual Information Seeking Mantra?” In IV’05: 9th Annual International Conference on Information Visualisation . Los Alamitos, CA: IEEE Computer Society, 110–118.
Culy, C. & Lyding, V. 2010. “Double Tree: An advanced KWIC visualization for expert users”. In IV 2010: 14th International Conference on Information Visualization. Los Alamitos, CA: IEEE Computer Society, 98–103.
Garretson, G. 2008. “Desiderata for linguistic software design”. International Journal of English Studies, 8 (1), 67–94.
Gries, S.T. 2009. Quantitative Corpus Linguistics with R. New York: Routledge.
Hakuta, K. 2011. WordSift: Supporting Instruction and Learning through Technology in San Francisco. Washington, DC: The Council of the Great City Schools.
Hearst, M. 2009. Search User Interfaces. Cambridge: Cambridge University Press.
ICE Corpus. 2014. The International Corpus of English. [URL] (accessed February 2014).
Kaufer, D., Geisler, C., Vlachos, P. & Ishizaki, S. 2006. “Mining textual knowledge for writing education and research: The DocuScope project”. In L.V. Waes, M. Leijten & C.M. Neuwirth (Eds.), Writing and Digital Media. Amsterdam: Elsevier, 115–129.
Keim, D.A. & Oelke, D. 2007. “Literature fingerprinting: A new method for visual literary analysis”. In W. Ribarsky & J. Dill (Eds.), Proceedings of the IEEE Symposium on Visual Analytics Science and Technology 2007, October 30 – November 1, Sacramento, CA, USA. Piscataway, NJ: IEEE, 115–122.
Lijffijt, J., Papapetrou, P. & Puolamäki, K. 2012. “Size matters: Finding the most informative set of window lengths”. In P.A. Flach, T. De Bie & N. Christianini (Eds.), Proceedings of the European Conference of Machine Learning and Principles and Practices of Knowledge Discovery in Databases (ECML-PKDD 2012), Part II.Berlin and Heidelberg: Springer, 451–466.
LOB Corpus. 1970–1978.The LOB Corpus, original version. Compiled by G. Leech, Lancaster University, S. Johansson, University of Oslo (project leaders) & K. Hofland, University of Bergen (head of computing).
Pike, W.A., Stasko, J., Chang, R. & O’Connell, T.A. 2009. “The science of interaction”. Information Visualization, 8 (4), 263–274.
R Development Core Team. 2011. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Available at: [URL] (accessed March 2012).
Shneiderman, B. 1996. “The eyes have it: A task by data type taxonomy for information visualizations”. In VL’96: Proceedings of the 1996 IEEE Symposium on Visual Languages. Los Alamitos, CA: IEEE Computer Society, 336–343.
Siirtola, H. 2012. Text Variation Explorer (TVE). Available at: [URL] (accessed April 2012).
Siirtola, H. 2013. TVE Video Tutorial. Available at: [URL] (accessed February 2013).
Siirtola, H., Nevalainen, T., Säily, T. & Räihä, K.-J. 2011. “Visualisation of text corpora: A case study of the PCEEC”. In T. Nevalainen & S.M. Fitzmaurice (Eds.), How to Deal with Data: Problems and Approaches to the Investigation of the English Language over Time and Space. Helsinki: VARIENG. [URL] (accessed February 2013).
Spence, R. 2007. Information Visualization: Design for Interaction. Harlow: Prentice-Hall Europe, Pearson Education Ltd.
Theus, M. 2011. Mondrian – Interactive Statistical Data Visualization in Java. [URL] (accessed February 2014).
Theus, M. & Urbanek, S. 2008. Interactive Graphics for Data Analysis: Principles and Examples. Boca Raton, FL: Chapman & Hall/CRC.
Ware, C. 2004. Information Visualization: Perception for Design. Second edn. San Francisco, CA: Morgan Kaufmann.
Youmans, G. 1991. “A new tool for discourse analysis: The vocabulary-management profile”. Language, 67 (4), 763–789.
Cited by (10)
Cited by ten other publications
Alharbi, Mohammad, Robert S Laramee & Tom Cheesman
2022. TransVis: Integrated Distant and Close Reading of Othello Translations. IEEE Transactions on Visualization and Computer Graphics 28:2 ► pp. 1397 ff.
Allen, William
2017. Making corpus data visible: visualising text with research intermediaries. Corpora 12:3 ► pp. 459 ff.
Janicke, Stefan & David Joseph Wrisley
2017. 2017 IEEE Conference on Visual Analytics Science and Technology (VAST), ► pp. 127 ff.
Kosmajac, Dijana, Vlado Keselj & Evangelos Milios
2017. Proceedings of the 2017 ACM Workshop on Exploratory Search and Interactive Data Analytics, ► pp. 59 ff.
Siirtola, Harri, Tanja Saily & Terttu Nevalainen
2017. 2017 21st International Conference Information Visualisation (IV), ► pp. 416 ff.
Säily, Tanja, Arja Nurmi, Minna Palander-Collin & Anita Auer
2016. The Case for Open Source Software: The Interactional Discourse Lab. Applied Linguistics 37:1 ► pp. 100 ff.
Kucher, Kostiantyn, Teri Schamp-Bjerede, Andreas Kerren, Carita Paradis & Magnus Sahlgren
2016. Visual analysis of online social media to open up the investigation of stance phenomena. Information Visualization 15:2 ► pp. 93 ff.
Siirtola, Harri, Poika Isokoski, Tanja Saily & Terttu Nevalainen
2016. 2016 20th International Conference Information Visualisation (IV), ► pp. 330 ff.
This list is based on CrossRef data as of 17 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.