Chapter published in:Corpus Approaches to Social Media
Edited by Sofia Rüdiger and Daria Dayter
[Studies in Corpus Linguistics 98] 2020
► pp. 149–174
Constructing corpora from images and text
An introduction to Visual Constituent Analysis
Visual analysis represents a significant oversight in the corpus literature, and possibly one that may lead to unintended omissions, particularly when analysing social media. In this chapter we introduce Visual Constituent Analysis (VCA), a method of multimodal corpus construction that allows researchers to construct and analyse visual aspects of online media in large-scale corpora. The chapter addresses the shortcomings of a purely textual approach to discourse analysis when dealing with social media texts and offers a solution using computer ‘Vision’-based image annotation (in our case Google Cloud Vision). Finally, we demonstrate how our approach can be used to analyse a sample of 150,000 micro-blog posts from Twitter and show the difference in level of user interaction with combined image/texts over language-only social media texts.
Keywords: corpus construction, multimodality, images, Twitter, information operations
Published online: 04 November 2020
Archer, Dawn, Wilson, Andrew & Rayson, Paul
2002 Introduction to the USAS Category System. Lancaster University. http://ucrel.lancs.ac.uk/usas/usas_guide.pdf (15 October 2019).
Baker, Paul & McEnery, Tony
Bateman, John, Wildfeuer, Jamina & Hiippala, Tuomo
Bednarek, Monika & Caple, Helen
Chen, Jianfu, Kuznetsova, Polina, Warren, David S. & Choi, Yejin
CNSSI 4009 Committee on National Security Systems (CNSS) Glossary. Strategic Environmental Research and Development Program (SERDP), Committee on National Security Systems. https://www.serdp-estcp.org/content/download/47576/453617/file/CNSSI%204009%20Glossary%202015.pdf (20 October 2019).
Deighton-Smith, Nova & Bell, Beth T.
2016 Facebook reports second quarter 2016 results. 27 July 2016, https://s21.q4cdn.com/399680738/files/doc_financials/2016/Facebook-Reports-Second-Quarter-2016-Results.pdf (20 October 2019).
Fanelli, Gabriele, Gall, Juergen, Romsdorfer, Harald, Weise, Thibaut & van Gool, Luc
Gatt, Albert, Tanti, Marc, Muscat, Adrian, Paggio, Patrizia, Farrugia, Reuben A., Borg, Claudia, Camilleri, Kenneth, Rosner, Micahel & van der Plas, Lonneke
2018 Face2Text: Collecting an annotated image description corpus for the generation of rich face descriptions. In Proceedings of the 11th edition of the Language Resources and Evaluation Conference, Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga (eds), 3323–3328. Miazaki: European Language Resources Association (ELRA).
Kress, Gunther & van Leeuwen, Theo
Kuznetsova, Polina, Ordonez, Vicente, Berg, Alexander, Berg, Tamara & Choi, Yejin
Liu, Jeffrey, Weinert, Andrew & Amin, Saurabh
2018 ‘Just not blond’: Fake Black Lives Matter Facebook page run by Australian union official – Report, The Guardian. 10 April 2018, https://www.theguardian.com/us-news/2018/apr/10/fake-black-lives-matter-facebook-page-run-by-australian-union-official-report (28 November 2019).
Mitchell, William J. T.
Ordonez, Vicente, Kulkarni, Girish & Berg, Tamara L.
Pastra, Katerina & Wilks, Yorick
Pew Research Center
2019 State of the Union 2019: How Americans see major national issues. 4 February 2019, https://www.pewresearch.org/fact-tank/2019/02/04/state-of-the-union-2019-how-americans-see-major-national-issues/ (28 October 2019).
2009 Wmatrix: A web-based corpus processing environment. Computing Department, Lancaster University. http://ucrel.lancs.ac.uk/wmatrix/ (5 November 2019).
Rayson, Paul, Archer, Dawn, Piao, Scott & McEnery, Tony
2004 The UCREL semantic analysis system. In Proceedings of the LREC-04 Workshop, beyond Named Entity Recognition Semantic Labelling for NLP Tasks, Lisbon, Portugal, Maria Teresa Lino, Maria Francisca Xavier, Fátima Ferreira, Rute Costa & Raquel Silva (eds), 7–12. Lisbon: European Language Resource Association (ELRA).
2015 Hearts on Twitter. 3 November 2015, https://blog.twitter.com/official/en_us/a/2015/hearts-on-twitter.html (3 February 2020).
2018 An update on our elections integrity work. 1 October 2018, https://blog.twitter.com/en_us/topics/company/2018/an-update-on-our-elections-integrity-work.html (5 October 2019).
2019a What is a retweet? Twitter Help Center. https://help.twitter.com/en/using-twitter/retweet-faqs (3 February 2020).
2019b How to like a tweet. Twitter Help Center. https://help.twitter.com/en/using-twitter/liking-tweets-and-moments (3 February 2020).
2020 Pricing: API access that scales with you and your solution. https://developer.twitter.com/en/pricing (10 February 2020).
United States Department of Justice
2018 Case 1:18-cr-00032-DLF: UNITED STATES OF AMERICA v. INTERNET RESEARCH AGENCY LLC. 16 January 2018, https://www.justice.gov/file/1035477/download (10 December 2019).
United States Joint Chiefs of Staff
2014 Information operations. Homeland Security Digital Library. https://www.hsdl.org/?view&did=759867 (10 December 2019).
Zappavigna, Michele & Martin, James R.