Emotions through texts and images: A multimodal analysis of reactions to the Brexit vote on Flickr

Catherine Bouko

Abstract

We analyzed in multimodal Flickr posts how citizens express emotion in response to the outcome of the EU Referendum that led to the Brexit vote. We conceived a model that articulates three levels of analysis, in a bid to understand how meaning operates, namely how inscribed, signalled and/or supported emotion is expressed in narrative and/or conceptual representations, in image and in text, through logico-semantic relations of expansion, projection and/or decoration. We tested this model empirically on a corpus of 173 posts. Our results reveal that emotion is very often supported through images and that narrative representations are particularly prevalent in the text.

Keywords:
Publication history
Table of contents

1.Introduction

The results of the EU referendum in favour of Brexit in June 2016 were a surprise, for politicians, media professionals, pollsters and, of course, millions of citizens at the same time, in and beyond the UK. Indeed, nearly all EU referendum’s polls failed to predict the voting intentions. While the relevancy of surveys is now being questioned, quantitative stance and sentiment analyses have provided insights about the dynamics of the online Brexit debate and its outcome (e.g., (Celli et al. 2016Celli, Fabio, Evgeny A. Stepanov, Massimo Poesio, and Giuseppe Riccardi 2016 ‘Predicting Brexit: Classifying Agreement Is Better than Sentiment and Pollsters’. presented at the Proceedings PEOPLE Workshop, Osaka. https://​aclweb​.org​/anthology​/W​/W16​/W16​-4312​.pdf; Lansdall-Welfare, Dzogang, and Cristianini 2016Lansdall-Welfare, Tom, Fabom Dzogang, and Nello Cristianini 2016 ‘Change-Point Analysis of the Public Mood in UK Twitter during the Brexit Referendum’. In 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW). Barcelona, Spain: IEEE. CrossrefGoogle Scholar; Del Vicario et al. 2017Del Vicario, Michela, Fabiana Zollo, Guido Caldarelli, Antonio Scalla, and Walter Quattrociocchi 2017 ‘Mapping Social Dynamics on Facebook: The Brexit Debate’. Social Networks, no. 50: 6–16. CrossrefGoogle Scholar). Supplementing these large-scale data analyses, we seek to identify the patterns of emotional expression that followed the announcement of the Referendum results with a model that comprises three levels of analysis. This with a view to understanding how meaning operates, namely how inscribed, signalled and/or supported emotion is expressed in narrative and/or conceptual representations, in image and in text, through logico-semantic relations of expansion, projection and/or decoration. The paper is structured as follows: in Section 2, we briefly review how emotion can be observed through affect, judgment and appreciation, with the word or the text as unit of analysis on Flickr. We also define the notions of visual symbols and tropes, and we outline three key challenges in text-image analysis. In Section 3, the data collection and the method are presented. Section 4 comprises the results, followed by a discussion and a conclusion.

2.Theoretical framework

The presentation of the theoretical framework is divided into four sections. Firstly, in Section 2.1., we first present Martin & White’s approach to affect, appreciation and judgment (2005Martin, J. R., and P. R. R. White 2005The Language of Evaluation. Appraisal in English. London: Palgrave Macmillan. CrossrefGoogle Scholar) and we point out how most researches on emotion exclusively address affect through the analysis of denoted emotion terms. However, people do not only use explicit emotion terms to express their emotions; emotions can be expressed through judgment and appreciation, too. In Section 2.2., we outline Micheli’s model of inscribed (i.e. denoted), signalled and supported emotion, which allows us not to confine the research to denoted emotion and to take judgment and appreciation into consideration. In Section 2.3., we point out how visual content in multimodal posts can be visual symbols and tropes that express emotion. Lastly, in Section 2.4., we lay out four issues in multimodal research that Bateman identified (e.g., 2014Bateman, John 2014Text and Image. A Critical Introduction to the Visual/Verbal Divide. London, New York: Routledge. CrossrefGoogle Scholar, 2017Bateman, John A. 2017 ‘Triangulating Transmediality: A Multimodal Semiotic Framework Relating Media, Modes and Genres | Elsevier Enhanced Reader’. Discourse, Context & Media, 160–74. CrossrefGoogle Scholar) and that we will address in our research design.

2.1Emotion in affect, judgment and appreciation terms

Martin and White (2005)Martin, J. R., and P. R. R. White 2005The Language of Evaluation. Appraisal in English. London: Palgrave Macmillan. CrossrefGoogle Scholar divide attitude into three types of discourse semantics: affect (expressing emotional reactions with explicit emotion terms, e.g., worry, angrily), judgment (assessing behaviour through norms, e.g., mean, reliable) and appreciation (defining value of things, e.g., relaxing, beautiful). Affect is composed of emotion terms that only denote emotions. For this reason, they can be identified based on computational linguists’ home-made lexicons or ready-to-use lexicons, e.g. the Linguistic Inquiry and Word Count lexicon (LIWC). Such lexicons are used to determine the valence of tweets (e.g., Garcia and Rimé 2019Garcia, David, and Bernard Rimé 2019 ‘Collective Emotions and Social Resilience in the Digital Traces After a Terrorist Attack’. Psychological Science 30 (4): 617–28. CrossrefGoogle Scholar) but are also used to analyze specific emotions. For example, Alizadeh et al. (2019)Alizadeh, Meysam, Ingmar Weber, Claudio Cioffi-Revilla, Santo Fortunato, and Michael Macy 2019 ‘Psychology and Morality of Political Extremists: Evidence from Twitter Language Analysis of Alt-Right and Antifa’. Epj Data Science 8 (17). CrossrefGoogle Scholar observed that left-wing extremists express more language indicative of anxiety than US liberals, while US right-wing extremists express lower anxiety than US conservatives.

Moreover, emotion in multimodal artefacts is often only analyzed through denoted emotion terms. For example, Bourlai and Herring (2014)Bourlai, Elli, and Susan C. Herring 2014 ‘Multimodal Communication on Tumblr: I Have so Many Feels!’ In Proceedings of the 2014 ACM Conference on Web Science, 171–175. Bloomington, IN: ACM. https://​dl​.acm​.org​/citation​.cfm​?id​=2615697 manually analyzed text, image, and their combination in over 2,000 Tumblr posts. The presence, polarity and type of emotions were evaluated based on emotion terms in texts (from the NRC Word-Emotion Association Lexicon) and facial expressions in images. Helen Caple (2018)Caple, Helen 2018 ‘“Lucy Says Today She Is a Labordoodle”: How the Dogs-of-Instagram Reveal Voter Preferences’. Social Semiotics 0 (0): 1–21. CrossrefGoogle Scholar analyzed in 92 multimodal #dogsatpollingstations Instagram posts how Australians expressed dissatisfaction with the government and their endorsement of other political parties in the context of the 2016 Australian federal elections. At the level of the text, she notably analyzed the voters’ preferences through the inclusion of certain policy issues or through naming certain politicians, and not others. In her analysis, emotion is only observed in two posts, in which the dog’s master expresses the dog’s emotions with emotion terms, e.g., ‘#billygraves wasn’t too happy tied up next to #malcolm this morning and had plenty to say when I was inside #voting’ (2018, 436). Emotion is not analyzed in the visual content of this Instagram corpus.

Unlike affect, judgment and appreciation are more subject to interpretation, given that they are often connoted rather than merely denoted. The LIWC comprises terms related to judgment and appreciation, but such lists remain particularly subjective. However, even lexicons of emotion terms will, to a certain extent, always remain subjective, as such lists are composed of prototypical, core, but also of marginal occurrences that can sometimes be erroneously coded (Bednarek 2008Bednarek, Monika 2008Emotion Talk Across Corpora. Basingstoke, UK: Palgrave MacMillan. CrossrefGoogle Scholar, 26). In the case of judgment and appreciation terms, it is even more challenging to draw up a list of core and marginal occurrences.

For this reason, many researchers on emotion in discourses seek to analyze emotional practices only at the level of affect/denotation, even in manual research methods, and do not include appreciation or judgment. However, people do not only use explicit emotion terms to express their emotions (e.g., Harris and Paradice 2007Harris, Ranida B., and David Paradice 2007 ‘An Investigation of the Computer-Mediated Communication of Emotions’. Journal of Applied Sciences Research 3 (12): 2081–90.Google Scholar; Shaver et al. 1987Shaver, Phillip, Judith Schwartz, Donald Kirson, and Cary O’Connor 1987 ‘Emotion Knowledge: Further Exploration of a Prototype Approach’. Journal of Personality and Social Psychology 52 (6): 1061–86. CrossrefGoogle Scholar), they can also convey emotions through judgment and appreciation. Furthermore, people often share their emotions through implicit emotional phrases that do not contain identified affect, judgment or appreciation terms. Consequently, it is preferable to favour frameworks that comprise affect, judgment, as well as appreciation and that are not limited to denoting terms. For example, Gaspar et al. observed how affective expressions of coping with a food contamination incident revealed people’s two main perceptions of the outbreak, namely as a challenge to cope with or as a threat (2016Gaspar, Rui, Claudia Pedro, Panos Panagiotopoulos, and Beate Seibt 2016 ‘Beyond Positive or Negative: Qualitative Sentiment Analysis of Social Media Reactions to Unexpected Stressful Events’. Computers in Human Behavior 56 (March): 179–91. CrossrefGoogle Scholar). In doing so, they coded tweets according to Skinner et al.’s coping classification scheme (2003)Skinner, Ellen A., Kathleen Edge, Jeffrey Altman, and Hayley Sherwood 2003 ‘Searching for the Structure of Coping: A Review and Critique of Category Systems for Classifying Ways of Coping’. Psychological Bulletin 129 (2): 216–269. CrossrefGoogle Scholar, which differentiates between 12 categories of coping strategies and three adaptive function strategies. The authors underline the more qualitative nature of this type of analysis, which entails an interpretation of the tweets, and not only coding the presence of lexicon-based terms. However, the downside was that their inter-coder agreement was quite low: they agreed on 885 tweets for their corpus of 2,099 tweets. As such, while this method is more finely-grained, it retains some elements of subjectivity in the interpretation of the meaning of the tweets, which makes it less easy to replicate than automated sentiment analyses.

Several researches on multimodality in (social) media are also based on an approach to emotion that is not limited to denoted emotion terms. For example, Korina Giaxoglou (2019)Giaxoglou, Korina 2019 ‘Visual Small Stories of #jesuisaylan: Sharing Cosmopolitan Emotions on Instagram’. Cahiers de l’ILSL 59: 59–75.Google Scholar conducted a multimodal analysis of 230 #JeSuisAylan Instagram posts, published after Alan Kurdi’s drowning in the Mediterranean in 2015, and pointed out the prevalence of the subjunctive mode (‘as if’), through which people contrast this drama with possible alternatives for the three-year-old child if he were still alive. Such reactions are considered as typical patterns in expressing sadness and compassion and were visible in comments as well as images of these Instagram posts. In her framework, emotional patterns are not only identified through denoted emotion terms and comprise emotion observed in opinion. Other studies do not focus on the emotions expressed by the authors of (media) artefacts, but rather on their potential to elicit emotions in the readers, like in Twitter images of the 2011 Egyptian revolution (Kharroub and Bas 2016Kharroub, Tamara, and Ozen Bas 2016 ‘Social Media and Protests: An Examination of Twitter Images of the 2011 Egyptian Revolution’. New Media & Society 18 (9): 1973–92. CrossrefGoogle Scholar) or in the ways in which text and image are used to present the sufferer in news media (Chouliaraki 2017Chouliaraki, Lilie 2017 ‘Towards an Analytics of Mediation’. In The Routledge Handbook of Multimodal Analysis, edited by Carey Jewitt, 2nd ed., 253–73. Abingdon: Routledge. https://​www​.routledge​.com​/The​-Routledge​-Handbook​-of​-Multimodal​-Analysis​-2nd​-Edition​/Jewitt​/p​/book​/9780415519748).

Most of these researches are not based on any clear textual markers in discourses. For example, in their analysis of the news storytelling on Twitter via the #egypt hashtag, Papacharissi and Oliveira (2012)Papacharissi, Zizi, and Maria de Fatima Oliveira 2012 ‘Affective News and Networked Publics: The Rhythms of News Storytelling on #Egypt’. Journal of Communication 62 (2): 266–82. CrossrefGoogle Scholar point out how “emotive tweets” blend humour, news sharing, opinion, and emotion but they do not explain the basis on which these four patterns could be identified, and rather consider that “discerning one from the other is difficult and doing so misses the point” (2012Papacharissi, Zizi, and Maria de Fatima Oliveira 2012 ‘Affective News and Networked Publics: The Rhythms of News Storytelling on #Egypt’. Journal of Communication 62 (2): 266–82. CrossrefGoogle Scholar, 2). Conversely, we argue that Micheli’s typology (2014Micheli, Raphaël 2014Les Émotions Dans Les Discours. Modèle d’analyse, Perspectives Empiriques. Bruxelles: De Boeck. CrossrefGoogle Scholar) offers markers of affect, appreciation and judgment that can be identified in relatively replicable ways. We present his model in the following section.

2.2Inscribed, signalled and supported emotion

Micheli (2014)Micheli, Raphaël 2014Les Émotions Dans Les Discours. Modèle d’analyse, Perspectives Empiriques. Bruxelles: De Boeck. CrossrefGoogle Scholar distinguishes between inscribed emotion, which corresponds to denoted emotion as mentioned above, signalled and supported emotion (émotion montrée and émotion étayée in French, respectively). In the first type, emotion terms are inscribed in discourses, like in Martin and White’s category of affect. Inscribed emotion can also consist of fixed figurative expressions (e.g., my heart sank) or psycho-physiological descriptions of emotional experiences, like in ‘my voice trembled’. In Micheli’s approach to inscribed emotion, the words, but also the sentences, are units of analysis.

The second type comprises linguistic markers that conventionally signal the speaker’s emotions: punctuation, interjection, intensifiers, swear words, etc. In Micheli’s model, swear words fall within the concept of “affective meaning” (2014Micheli, Raphaël 2014Les Émotions Dans Les Discours. Modèle d’analyse, Perspectives Empiriques. Bruxelles: De Boeck. CrossrefGoogle Scholar, our translation). This notion is based on the distinction between referential meaning and expressive meaning. For Lyons, expressive meaning is “the kind of meaning by virtue of which speakers express, rather than describe [like for referential meaning] their beliefs, attitudes and feelings” (Lyons 1995Lyons, John 1995Linguistic Semantics. Cambridge: Cambridge University Press. CrossrefGoogle Scholar, 45). Löbner insists on the “immediate expression of personal sensations, feelings, attitudes or evaluations” in expressive meaning (Löbner 2002Löbner, Sebastian 2002Understanding Semantics. London: Arnold.Google Scholar, 35). Affective meaning is a subcategory of expressive meaning and only concerns terms which indicate the speaker’s emotion regarding the referred object of reality. Accordingly, using words that are loaded with affective meaning signals the locutor’s emotions. Many swear words clearly indicate negative emotions towards a referred person, for example. However, the concept of affective meaning is particularly difficult to use, given its fuzzy nature. Indeed, this begs the question to what extent affective meaning is firmly embedded in the meaning of words, and can thus be identified in replicable ways. Despite this obstacle, Micheli insists on the relevance of the use of this notion, which signals emotions and reveals their effects. Micheli’s model comprises a third approach. While signalled emotion concerns the effects of emotion, supported emotion concerns its causes. In supported emotion, the speaker indicates her emotions by eliciting a situation which is considered to be emotional for her. Seven types of criteria allow supported emotion to be identified (Micheli 2014Micheli, Raphaël 2014Les Émotions Dans Les Discours. Modèle d’analyse, Perspectives Empiriques. Bruxelles: De Boeck. CrossrefGoogle Scholar, 115–118): persons involved, proximity in time and space, consequences and their degree of probability, assignment of responsibility, control of the situation, analogy with other relevant emotional situations and compatibility with norms and values. It is important to note that it is not the situation itself that determines emotions; these vary according to individuals, even in the case of stereotypical emotion-loaded situations like school examinations.

These three layers of emotion – inscribed, signalled, supported – imply distinct methods of analysis. Indeed, inscribed emotion explicitly indicates the presence of emotion in discourse; signalled emotion needs abductive inference through a downstream process towards the effects of emotion in discourse. Supported emotion needs deductive inference, upstreaming towards the situations that caused the emotions (Plantin 2011Plantin, Christian 2011Les bonnes raisons des émotions: principes et méthode pour l’étude du discours émotionné. Berne: Peter Lang. CrossrefGoogle Scholar, 155; Micheli 2014Micheli, Raphaël 2014Les Émotions Dans Les Discours. Modèle d’analyse, Perspectives Empiriques. Bruxelles: De Boeck. CrossrefGoogle Scholar, 119). Reconstructing the writer’s emotions through their effects or potential causes implies more subjectivity than emotion labelling and can explain why most research only focuses on inscribed emotion (Bednarek 2008Bednarek, Monika 2008Emotion Talk Across Corpora. Basingstoke, UK: Palgrave MacMillan. CrossrefGoogle Scholar, 152). Nevertheless, by analyzing the three layers of emotion, emotion can be examined in affect, judgment and appreciation with the post as unit of analysis. Furthermore, with such a research design, it is possible to analyze beyond inscribed emotion. For example, Caple does not observe emotion but only judgment in the Instagram post with the caption “The government is a greedy piglet that suckles on a taxpayer’s teat until they have sore, chapped nipples” (Caple 2018Caple, Helen 2018 ‘“Lucy Says Today She Is a Labordoodle”: How the Dogs-of-Instagram Reveal Voter Preferences’. Social Semiotics 0 (0): 1–21. CrossrefGoogle Scholar). In our view, such a post does not contain any emotion term (i.e. inscribed emotion) but rather vehicles of signalled emotion (with the affective meaning of the piglet metaphor) and supported emotion, through criticizing the government and their financial policies.

2.3Visual symbols and tropes

In this research, we seek to analyze affect, judgment and appreciation in the textual as well as in the visual elements of our dataset of multimodal Flickr posts. In her approach to “affective economies”, Ahmed points out how emotion is not merely a private issue but aligns individuals with communities:

Affect does not reside in an object or sign, but is an effect of the circulation between objects and signs (= the accumulation of affective value over time). Some signs, that is, increase in affective value as an effect of the movement between signs: the more they circulate, the more affective they become, and the more they appear to ‘contain’ affect.(Ahmed 2004Ahmed, Sara 2004 ‘Affective Economies’. Social Text 79 22 (2): 117–39. CrossrefGoogle Scholar, 120)

In our analysis of emotions in images, the circulation of emotion through visual symbols and tropes is a central hypothesis. When emotions are difficult to pin down or to locate in a body or object, they circulate more easily among signs and are easily activated in specific contexts, like the fear of otherness exploited in populist discourses or the different facets of patriotic love accumulated in the symbol of the national flag. We argue that visual symbols and tropes can express emotions through visual markers which are part of the collective cultural background. Some of them can be used to symbolize various contexts and make emotion circulate between objects and signs. Therefore, such visual content can particularly accumulate affective value over time and plays a significant role in expressing emotion. To test this hypothesis, we first need to specify our definition of visual symbols and tropes. Visual symbols and tropes fall within the processes of “iconographic symbolism” and “metaphorical associations”, respectively, as Machin argues,

Visual elements can also mean, not so much by transport of meaning from another domain, such as in the case of a thick line on a page being used to represent something durable [i.e. metaphorical associations], but because they have come themselves to represent particular ideas or concepts [i.e. iconographic symbolism].(Machin 2007Machin, David 2007Introduction to Multimodal Analysis. London: Bloomsbury Academic.Google Scholar, 39)

Emblems, such as national flags, are symbols through iconographic symbolism, as are “symbolic attributes” (Kress and van Leeuwen 2006Kress, Gunther, and Theo van Leeuwen 2006Reading Images: The Grammar of Visual Design (2nd Edition). New York: Routledge. CrossrefGoogle Scholar, 105) such as apples, crosses, crowns, etc., described by art historians for centuries.

In Machin’s definition, metaphorical associations generally imply some transfer between two domains. While a narrower definition of metaphors might be more appropriate (Bateman 2014Bateman, John 2014Text and Image. A Critical Introduction to the Visual/Verbal Divide. London, New York: Routledge. CrossrefGoogle Scholar, 181), this definition is relevant to analyze the more general category of tropes. The visual tropes in our research design can consist of one of the four master tropes, i.e., metaphor, metonymy, synecdoche and irony (Burke 1969Burke, Kenneth 1969A Grammar of Motives. Berkeley: University of California Press.Google Scholar).

Previous research (Bouko et al. 2018Bouko, Catherine, July De Wilde, Sofie Decock, Valentina Manchia, Oprhée De Clercq, and David Garcia 2018 ‘Reactions to Brexit in Images: A Multimodal Content Analysis of Shared Visual Content on Flickr’. Visual Communication. CrossrefGoogle Scholar), in which we coded the types of images11.I use the term image here as a cover term for the visual component of the post. in a corpus of Brexit-related Flickr posts, revealed that 12% of the visual content falls within the category of visual symbols and tropes. We also coded these posts according to five categories of interpersonal functions. These functions are inspired by Halliday and Matthiessen’s approach to systemic functional linguistics (2004Halliday, Michael, and Christian Matthiessen 2004An Introduction to Functional Grammar. 3rd edition. London: Routledge.Google Scholar) and consist of information sharing (journalistic or personal information); self-expression (personal points of view and/or personal analyses); eye-witnessing (i.e. sharing pictures of observed events, in private or public spaces); sharing personal moments (e.g., personal moments like eating a post-Brexit consolation cake) and word play involving the term Brexit. These categories are not mutually exclusive: people can express their opinion through word play when they write jokes about politicians, for example. By contrast, a post which is only a play on the word Brexit falls within just the category of playing, because there is no appraisal. In self-expression, people can express their emotions through affect, judgment or appreciation. Our previous research revealed that 43% of the posts fulfilled a function of self-expression (personal opinions, emotions or analyses regarding the vote). 23% of these posts contained visual symbols or tropes (173 posts). Therefore, this type of visual content is more frequent in posts with self-expression than in the posts of the corpus in general.

As we know, a metaphor “is not a figure of speech, but a mode of thought” (Lakoff 1993Lakoff, George 1993 ‘Metaphor and Thought’. In The Contemporary Theory of Metaphor, edited by Andrew Ortony, 202–51. Cambridge: Cambridge University Press.Google Scholar, 210) which is also based on cultural conventions: the transferred properties between the source and the target domains somehow need to be integrated into the definition of the two domains, specific to cultural contexts, to be understood. As we will discover, many of the Flickr posts of our corpus comprise creative tropes that do not belong to cultural conventions (yet) and that need textual explanations to be understood by the readers.

2.4Four key challenges in text-image analysis

According to Bateman (e.g., 2014Bateman, John 2014Text and Image. A Critical Introduction to the Visual/Verbal Divide. London, New York: Routledge. CrossrefGoogle Scholar, 2017Bateman, John A. 2017 ‘Triangulating Transmediality: A Multimodal Semiotic Framework Relating Media, Modes and Genres | Elsevier Enhanced Reader’. Discourse, Context & Media, 160–74. CrossrefGoogle Scholar), most research on multimodality pertains to (at least) four major issues. Firstly, the empiric relevance of – sometimes very – abstract models is often questionable. Indeed, problems often arise when we attempt to apply a framework to a concrete corpus of artefacts. Most researchers illustrate their models with one or two examples but seldom empirically test it to a corpus.

Secondly, determining the unit of analysis in visual content remains a key issue. Unlike models that are based on identifying all the visual message elements prior to specifying connections with texts or with other images (e.g., Royce 2007Royce, Terry D. 2007 ‘Intersemiotic Complementarity: A Framework for Multimodal Discourse Analysis’. In New Directions in the Analysis of Multimodal Discourse, edited by Terry D. Royce and Wendy Bowcher, 63–109. New York: Lawrence Erlbaum & Assoc. https://​opus​.lib​.uts​.edu​.au​/handle​/10453​/8221), Bateman insists on the necessary multiple back-and-forths between text and image in order to identify the relevant participants in meaning making. Indeed, the verbal and visual components cannot be analyzed independently; meaning is often understood through the interaction of both elements of a multimodal artefact.

Thirdly, most models take the form of classifications that allow us to identify the broad range of text-image possibilities but say little about how these combinations operate to create meaning.

In order to gain insights into how meaning operates in text-image relations, we will make use of three models that allow us to encapsulate discourse semantics. Firstly, Micheli’s typology of layers of emotion will help us outline how text and images can be used to express emotion. Secondly, we will make use of Kress and van Leeuwen’s distinction between narrative and conceptual representations:

Where conceptual patterns represent participants in terms of their class, structure or meaning, in other words, in terms of their generalized and more or less stable and timeless essence, narrative patterns serve to present unfolding actions and events, processes of change, transitory spatial arrangements.(Kress and van Leeuwen 2006Kress, Gunther, and Theo van Leeuwen 2006Reading Images: The Grammar of Visual Design (2nd Edition). New York: Routledge. CrossrefGoogle Scholar, 59)

Narrative and conceptual representations are concepts that can be observed in text and image (without any strict equivalence, though): narrative representations can be realized in visual narrative processes as well as in linguistic narrative clauses; conceptual representations share similarities with conceptual structures in language. Conceptual representations are divided into classifications (i.e. taxonomies between participants), analytical processes (part-whole structures) and symbolic processes (e.g., symbols and tropes). The points of contact Kress and Van Leeuwen (2006Kress, Gunther, and Theo van Leeuwen 2006Reading Images: The Grammar of Visual Design (2nd Edition). New York: Routledge. CrossrefGoogle Scholar, 109–110) outline between visual conceptual processes and relational attributive or identifying processes in language (Halliday 1985Halliday, Michael 1985An Introduction to Functional Grammar. London: Edward Arnold.Google Scholar) are particularly significant for this research. In attributive processes, the carrier-attribute relation consists of ‘a is an attribute of x’. Intensive attributive processes concern structures that indicate what a carrier is. Circumstantial attributive processes refer to ‘where’, ‘when’ and ‘what with’ the carrier is (e.g., their suitcase was in the blue car). Possessive attributive processes concern what a carrier has (e.g., she has the strangest haircut of the village). In identifying clauses, two entities are related through a relation of identity, namely ‘a is the identity of b’; one entity works as the identifier and the other entity as the identified. This identity process is reversible, e.g., George is the teacher or the teacher is George (Halliday and Matthiessen 2004Halliday, Michael, and Christian Matthiessen 2004An Introduction to Functional Grammar. 3rd edition. London: Routledge.Google Scholar, 210–248). Thirdly, the relations between narrative and/or conceptual representations in the visual content and/or the textual parts of the Flickr posts will be further analyzed with Kong’s adapted taxonomy of image-text logico-semantic relations (2006Kong, Kenneth C. C. 2006 ‘A Taxonomy of the Discourse Relations between Words and Visuals’. Information Design Journal 14 (3): 207–30. CrossrefGoogle Scholar). In this model, text-image relations can consist of expansion, projection or decoration. Expansion is further divided into elaboration (no new information), extension (new information) and enhancement (new information by specifying circumstances, such as spatio-temporal context, manner, justification, motivation, etc.). Elaboration comprises explanation (in other words), exemplification (for example), specification (to be precise) and identification (namely). Projection comprises projected speech or thought. In decoration, new but omissible information makes the text more attractive, without any underlying purposes.

Fourthly, most models fail to consider materiality as a fundamental and inherent component of any multimodal meaning-making (Bateman 2017Bateman, John A. 2017 ‘Triangulating Transmediality: A Multimodal Semiotic Framework Relating Media, Modes and Genres | Elsevier Enhanced Reader’. Discourse, Context & Media, 160–74. CrossrefGoogle Scholar, 167):

Kress and van Leeuwen, for example, while fully accepting the centrality of materiality for any complete model of signification, nevertheless see semiotic modes as emerging when they ‘transcend’ material distinctions, becoming sufficiently conventionalized to develop ‘grammars’ independent of particular material support.(Bateman 2017Bateman, John A. 2017 ‘Triangulating Transmediality: A Multimodal Semiotic Framework Relating Media, Modes and Genres | Elsevier Enhanced Reader’. Discourse, Context & Media, 160–74. CrossrefGoogle Scholar, 167)

To grant materiality the key place it deserves, we turned our attention to the materiality of semiotic modes on Flickr. The constellation of materialities on Flickr is composed of the following default display: the central zone, dedicated to the visual content, is situated in the upper part of the post. The zone under the image includes the members’ names and their profile pictures, the tags, as well as the number of views, faves and the comments left by other Flickr members. The two most important writing spaces are the title and the description zones. Some Flickr affordances (faves, comments, tags) enable social relations between the platform’s users. That said, not all of them are used in practice. For example, Barton (2015)Barton, David 2015 ‘Tagging on Flickr as a Social Pratice’. In Discourse and Digital Practices: Doing Discourse Analysis in the Digital Age, edited by Rodney Jones, Alice Chik, and Christoph A. Hafner, 48–65. Abingdon, VA: Routledge.Google Scholar observed that most tags are created by the photographers when they upload their pictures, even when, by default, anyone can add tags at any time. In our corpus, comments and faves were particularly rare and were not taken into account in the analysis.

Such material canvas has been designed for well-defined semiotic modes and for specific text-image relations: texts were intended to give a title or describe the visual content. However, the use of these materialities has evolved since the creation of Flickr; its users have got to grips with them to create meaning beyond the initially defined semiotic modes. Founded in 2004, Flickr has centered its strategy on the sharing of images from the very beginning. Initially conceived as a chat room which made it possible to share pictures instantly, Flickr first developed its role of image repository. For some, this role is still the predominant one on Flickr: “if Google is an information retrieval service, Twitter is for news and links exchange, Facebook is for social communication, and Flickr is for image archiving, Instagram is for aesthetic visual communication” (Manovich 2016Manovich, Lev 2016Instagram and Contemporary Image. http://​manovich​.net​/index​.php​/projects​/instagram​-and​-contemporary​-image, 11). However, Flickr evolved as a fully-fledged social network, enhancing interactions between the members of the platform. “Share your photos. Watch the world”, Flickr’s slogan for several years, showed how Flickr built on the collective experience – in reality a rather ‘connective’ experience (Van Dijck 2011Van Dijck, José 2011 ‘Flickr and the Culture of Connectivity: Sharing Views, Experiences, Memories’. Memory Studies 4 (4): 401–15. CrossrefGoogle Scholar) – which is constructed through sharing visual records of the world. Around 2005, Flickr was the ultimate platform to share images of catastrophes, such as the bombing attacks in London or the floods in Australia (Liu et al. 2008Liu, Sophia, Jeannette Sutton, Amanda Hughes, and Sarah Vieweg 2008 ‘In Search of the Bigger Picture: The Emergent Role of On-Line Photo Sharing in Times of Disaster’. In Proceedings of the 5th International ISCRAM Conference. ​/paper​/In​-Search​-of​-the​-Bigger​-Picture​-The​-Emergent​-Role​-Liu​-Palen​/c71aa07c9269e4a74903df18fe40b172a5a7fac5; Vis et al. 2014Vis, Farida, S. Faulkner, K. Parry, Yana Manyukhina, and L. Evans 2014 ‘Twitpic-Ing the Riots. Analysing Images Shared on Twitter during the 2011 UK Riots’. In Twitter and Society, edited by Katrin Weller, Axel Bruns, Jean Burgess, Merja Mahrt, and Cornelius Puschmann, 385–98. New York: Peter Lang.Google Scholar). For several years, Twitter has overtaken Flickr as the ideal platform to share images of events in real-time (Burgess 2011Burgess, Jean 2011 ‘Image Sharing in the #qldfloods’. Mapping Online Publics 6 March 2011 http://​mappingonlinepublics​.net​/2011​/03​/06​/image​-sharing​-in​-the​-qldfloods/) and Flickr has developed other image uses. Flickr’s current slogan illustrates this third shift: “Find your inspiration. Join the Flickr community, home to 13 billion photos and 2 million groups”. This slogan focuses on the passion for photography that the Flickr members share. The Flickr community no longer seems to be built around photographic records of the world but rather around a passion for aesthetic photography. With the current slogan, expression through photography seems to explicitly supplant photographic content. Whereas many Flickr members are henceforth “photo lovers” (Seko 2013Seko, Yukari 2013 ‘Picturesque Wounds: A Multimodal Analysis of Self-Injury Photographs on Flickr’. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research 14 (2). http://​www​.qualitative​-research​.net​/index​.php​/fqs​/article​/view​/1935, 3), exploring form prior to content, others still centre their visual posts around real-time events. For example, by comparing images posted with the hashtag #ebola, Seltzer et al. noticed that the images posted on Flickr were “more serious and literal” than the ones on Instagram (Seltzer et al. 2015Seltzer, E. K., N. S. Jean, E. Kramer-Golinkoff, D. A. Asch, and R. M. Merchant 2015 ‘The Content of Social Media’s Shared Images about Ebola: A Retrospective Study’. Public Health 129 (9): 1273–77. CrossrefGoogle Scholar, 1275). This seems to reveal that in such context, the Flickr members were more interested in sharing visual records of that kind without aesthetic explorations.

3.Data and method

3.1Data collection

In previous research (Bouko et al. 2018Bouko, Catherine, July De Wilde, Sofie Decock, Valentina Manchia, Oprhée De Clercq, and David Garcia 2018 ‘Reactions to Brexit in Images: A Multimodal Content Analysis of Shared Visual Content on Flickr’. Visual Communication. CrossrefGoogle Scholar), we collected all the English language posts containing “Brexit” in the text or as tag published on Flickr between June 24 (the day of the announcement of the referendum results) and July 23 2016. All posts were anonymized. Every post was carefully analysed manually. Posts which were not related to Brexit, despite the presence of the word ‘Brexit’, were removed from the corpus. All posts were analyzed but we presented the results of a weighted corpus of 2,229 posts. In this weighted corpus, we only took one post per series into account. Series are sets of photographs taken at the same place by the same person during a short period of time and that comprise the same titles and descriptions. A typical example of this is a series of fifty pictures of an anti-Brexit march, taken and shared by one Flick member. The weighted corpus prevented us from awarding disproportionate weight to images taken in series.

As mentioned above, self-expression was the most prevalent category of interpersonal functions (43%). Within the posts coded as self-expression, visual symbols and tropes were the most frequent type of visual content (23%). This explains why we focus on posts containing visual symbols and tropes coupled with self-expression in this paper.

We decided to analyze emotions in a corpus of multimodal posts shared on Flickr because this social network might be more socio-demographically representative than Twitter, the users of which tend to be younger than average and are invariably men when engaged in political debate (Llewellyn and Cram 2016Llewellyn, Clare, and Laura Cram 2016 ‘The Results Are in and the UK Will #Brexit: What Did Social Media Tell Us about the UK’s EU Referendum?’ In EU Referendum Analysis 2016: Media, Voters and the Campaign. Early Reflections from Leading UK Academics, edited by Daniel Jackson, Einar Thorsen, and Dominic Wring, 90–91. Poole: The Centre for the Study of Journalism, Culture and Community Bournemouth University.Google Scholar). Besides, Flickr is not used very much by communication professionals (politicians and journalists) and is not fuelled by bots, both of which are particularly active on Twitter (Smith 2017Smith, Kit 2017 ‘44 Incredible and Interesting Twitter Statistics’. Brandwatch (blog) 17 December 2017 https://​www​.brandwatch​.com​/blog​/44​-twitter​-stats/; Varol et al. 2017Varol, Onur, Emilio Ferrara, Clayton Davis, Filippo Menczer, and Alessandro Flammini 2017 ‘Online Human-Bot Interactions: Detection, Estimation, and Characterization’. In Eleventh International AAAI Conference on Web and Social Media. https://​aaai​.org​/ocs​/index​.php​/ICWSM​/ICWSM17​/paper​/view​/15587). Lastly, Flickr is a smaller social network, which enabled us to work on an exhaustive corpus.

3.2Back and forth between image and text and between models

As mentioned in the literature review, this research is based on previous research. In Bouko et al. 2018Bouko, Catherine, July De Wilde, Sofie Decock, Valentina Manchia, Oprhée De Clercq, and David Garcia 2018 ‘Reactions to Brexit in Images: A Multimodal Content Analysis of Shared Visual Content on Flickr’. Visual Communication. CrossrefGoogle Scholar, we determined whether the visual content of the post contained visual symbols and/or tropes. We noticed that most of these posts are composed of one central element, a “nucleus”, and sometimes of subordinate units, namely “satellites” (Mann and Thompson 1988Mann, William C., and Sandra A. Thompson 1988 ‘Rethorical Structure Theory: Toward a Functional Theory of Text Organization’. Text – Interdisciplinary Journal for the Study of Discourse 8 (3): 243–81. CrossrefGoogle Scholar). In such cases, it seems easy to identify the “visual message elements” (Royce 2007Royce, Terry D. 2007 ‘Intersemiotic Complementarity: A Framework for Multimodal Discourse Analysis’. In New Directions in the Analysis of Multimodal Discourse, edited by Terry D. Royce and Wendy Bowcher, 63–109. New York: Lawrence Erlbaum & Assoc. https://​opus​.lib​.uts​.edu​.au​/handle​/10453​/8221) that can be considered as visual symbols or tropes. However, even in apparently simple cases like these, decomposing visual elements can be challenging, given that components of images are not delivered in delimited parts. Sometimes, satellites might seem insignificant or unnoticed and it is the text that outlines their conceptual dimension. In such cases, it is the interaction between the visual content and the text that brings the relevant visual message elements into existence (Bateman 2014Bateman, John 2014Text and Image. A Critical Introduction to the Visual/Verbal Divide. London, New York: Routledge. CrossrefGoogle Scholar, 173). For example, the picture of faded roses is symbolic thanks to the text:

(1)

“The end of something beautiful…or out with the old? #Brexit”,

but would be indexical with a text like “I bought these flowers last Tuesday and they have already wilted.” For this reason, we systematically went back and forth between the text and the image of each post to determine whether the image could be considered as a conceptual representation, as well as whether image, text, or indeed both, were loaded with emotion in relation to Brexit. Continuous back and forth between text and image were coupled with back and forth between the three models. In other words, we first analyzed to what extent and how these posts contained inscribed, signalled and/or supported emotion using Micheli’s model. Then, we analyzed the narrative and/or conceptual representations in these posts based on Kress and van Leeuwen’s approach. Like a feedback loop, we then revisited the analysis of emotion, to potentially refine it based on the analysis of the narrative and conceptual representations. Finally, we identified the logico-semantic relations between text and image with a simplified version of Kong’s taxonomy, due to some boundary issues between the subcategories under elaboration. In Kong’s case study, some elements are loosely coded as both specification and identification; some other elements are coded as explanation, even though the differences between them are not specified. These ambiguities need further investigations that go beyond the scope of this research. For these reasons, in our research, expansion consists of three categories: elaboration (explanation of existing image by text and conversely, i.e. ‘in other words’), enhancement (additional information between text and image) and extension (new information between text and image).

Micheli’s model has been slightly adapted to visual content. Inscribed emotion in image can consist of symbols of emotion (e.g., tears), facial expressions (like in most multimodal emotion analysis presented above), as well as of emotion terms included in the visual content, such as in pictures of signs or words integrated in drawings. In such cases, the visual content is itself multimodal. We are of the opinion that signalled emotion does not have an equivalent in visual content. Visual modality is a notion close to visually signalled emotion but it cannot be mistaken. In Bourlai and Herring’s analysis of Tumblr posts, some of the markers fall within the concept of signalled emotion: capitalization of words, repetition of letters, punctuation, as well as bold or italic fonts in text. However, in their research design, these markers are considered as modality markers, which allow the research to code the intensity of inscribed emotion, but not (signalled) emotion as such. In our view, they can also reveal emotion in their own right, through highlighting the effects of emotion on discourses (i.e. signalled emotion), even if emotion terms are absent. Parallel to these linguistic markers, they coded two visual markers, namely “intense movements and facial expressions” (2014, 3). These two markers are also modality markers. Research on multimodality offers many insights about modality in multimodal content (e.g., Kress and Van Leeuwen 2006Kress, Gunther, and Theo van Leeuwen 2006Reading Images: The Grammar of Visual Design (2nd Edition). New York: Routledge. CrossrefGoogle Scholar; Machin 2007Machin, David 2007Introduction to Multimodal Analysis. London: Bloomsbury Academic.Google Scholar), but, again, modality does not correspond to signalled emotion and does not provide markers to code signalled emotion in visual content.

By contrast, supported emotion can be outlined in visual content. In Example 3, the torn Union Jack highlights the consequences of the outcome of the referendum on the unity of the United Kingdom (one of Micheli’s criteria for supported emotion, cfr supra).

(2)

Image 1.Supported emotion through image: Focus on the disunity of the United Kingdom as the cause of emotion (Example 2)
Image 1.

To summarize, our model of analysis is constructed as followed:

  • Layers of emotion

    • Inscribed emotion in visual content and/or in text

    • Signalled emotion in text

    • Supported emotion in visual content and/or in text

  • Representations

    • Narrative representations in visual content and/or in text

      • Material processes

      • Behavioural processes

      • Mental processes

      • Verbal processes

    • Conceptual representations in visual content and/or in text

      • Relational processes

        • Attributive (intensive, circumstantial or possessive)

        • Identifying

      • Existential processes

  • Text-image relations

    • Expansion

      • Elaboration: explanation of existing content image by text and vice versa (no new information)

      • Enhancement: additional information

      • Extension: new information

    • Projection

    • Decoration

One annotator manually coded the layers of emotion in text and in image. To address the issue of single-coding reliability, the corpus was coded twice with an interval of one month in between the coding sessions (Kouper 2010Kouper, Inna 2010 ‘The Pragmatics of Peer Advice in a LiveJournal Community’. Language@Internet. http://​www​.languageatinternet​.org​/articles​/2010​/2464). The Cohen’s kappas indicate a substantial level of agreement between the two codings (k between .784 and .798).

The analysis of the narrative and conceptual representations as well as that of the text-image logico-semantic relations was exclusively qualitative, with a view to gaining finely-grained insights.

4.Results

The analysis of emotion in multimodal posts enabled us to identify the three sets of patterns below. They are structured according to the roles of the visual representations to inscribe or support emotion. The first results are quantitative findings about the prevalence of the three layers of emotions. The results that follow are qualitative insights into how emotion is expressed in text-images relations through expansion, projection and/or decoration, with narrative and/or conceptual representations.

Layers of emotion in image and text

25% of the corpus comprises no emotion in image (only in text); 12% of the corpus comprises inscribed emotion in image and 65% comprises supported emotion in image (see Table 1).

Tabla 1.Occurrence of patterns of emotion in image and text
Set of patterns 1 (total = 25%)
In image In text %
No emotion + Inscribed emotion  6
Signalled emotion  4
Supported emotion  5
Inscribed and signalled emotion  0
Inscribed and supported emotion  7
Signalled and supported emotion  1
Inscribed, signalled and supported emotion  2
Set of patterns 2 (total = 12%)
In image In text %
Inscribed emotion + No emotion  2
Inscribed emotion  4
Signalled emotion  1
Supported emotion  2
Inscribed and signalled emotion  1
Inscribed and supported emotion  1
Signalled and supported emotion  1
Inscribed, signalled and supported emotion  0
Set of patterns 3 (total = 63%)
In image In text %
Supported emotion + No emotion  3
Inscribed emotion  5
Signalled emotion  2
Supported emotion 28
Inscribed and signalled emotion  0
Inscribed and supported emotion 10
Signalled and supported emotion 10
Inscribed, signalled and supported emotion  5

The three patterns of emotion in image (no emotion, inscribed, supported) are combined in seven patterns of combination of inscribed, supported and signalled emotion in text. The most prevalent pattern is the combination of supported emotion in image and supported emotion in text (28%). The other most prevalent patterns also include supported emotion in image and text: 10% of the occurrences consist of inscribed and supported emotion in text, as well as 10% of signalled and supported emotion.

In total, inscribed emotion is present in the text of 41% of the corpus. In these 41%, emotion is textually exclusively expressed through inscribed emotion in 15% of the corpus (6% with no emotion in image, 4% with inscribed emotion in image and 5% in supported emotion in image. Table 1 also reveals that the combination of the three layers in text is rather limited: only in 7% of the corpus (2, 0 and 5%, respectively). In total, supported emotion is present in the text of 71% of the corpus.

Set of patterns 1: Visual representations that do not contain any marker of emotion

In these posts, the image consists of neutral symbols, including neutral UK or EU flags (as opposed to torn flags, for instance) and does not inscribe or support emotion. Therefore, emotion is only inscribed, signalled and/or supported through text. The text-image relation falls within decoration: the image does not provide any information and only fulfils a function of visual decoration for a text that is independent of the image, as in the example below:

(3)

Picture of a cushion with the British flag + “Not even God could save England from another #Brexit… now the humiliation is complete… #sorry #europeanchampionship #ek2016 #ek #2016 #england #theylost #aurevoir #byebye #totziens #godsavethequeen #englishflag #flag #flags #nationalflag

In this post, the author inscribes her emotion with the emotion terms “sorry” and supports her emotion with analogies with the defeat and the resulting humiliation during the Euro2016 championship.

Set of patterns 2: Visual representations that inscribe emotion

In such posts, the visual content can consist of visual representations of symbols of emotions (e.g., tears, smile), as well as of images of emotional behaviours (e.g., a person crying or jumping for joy). Pictures of emotion words also fall within visually, conceptually inscribed emotion (e.g., picture of a tea cup with the word ‘happy’).

Such posts articulate layers of emotion in various text-image relations:

  1. Elaboration of visually inscribed emotion through no emotion in text: the text explains the image but does not carry any emotion, e.g., a post with a picture of someone with her head in her hands and the text “Brexit effects”.

  2. Elaboration of visually inscribed emotion through inscribed emotion in text: the locutor inscribes the visual emotion in text, e.g. a post with the image of tears and the text “so sad”. By doing so, the text echoes the image (or vice versa).

  3. Extension of visually inscribed emotion through inscribed emotion in text: inscribed emotion in image is coupled with new information, i.e. another emotion, inscribed in text. For example, a post with the stars of the European Union flag changed into yellow tears, which symbolize sadness, and the text “Sorry to see you go after Brexit”. In that case, two emotions (i.e. sad and sorry feelings) are inscribed, in image and text, respectively.

  4. Extension of visually inscribed emotion through supported emotion in text: inscribed emotion in image is coupled with supported emotion in text that provides new information, as opposed to additional information through elaboration. For example, a post contains the picture of a girl crying and the text “Britain, what have you done? #brexit #disaster” extends this visually inscribed sadness with another issue, i.e. blaming the other camp. Emotion is supported through arguments regarding responsibility of the Leave camp, and the word “disaster” can be considered as carrying affective meaning (signalled emotion). Indeed, Brexit cannot be factually considered a disaster; this emotionally-loaded term rather reveals the locutor’s emotion.

  5. Extension of visually inscribed emotion through inscribed and supported emotion in text: same as pattern 3 with the addition of emotion terms, e.g., “Britain, what have you done? #brexit #disaster. So sad.”

These patterns are constructed with various combinations of conceptual and narrative representations, in image and text. In the example in pattern 1, the image can be seen as narrative: it contains the unfolding action of holding one’s head in one’s hands. The text could be narrative as well and focus on the narrative aspect of the image with indications like ‘This woman has her head in her hands’. Rather, the text consists of conceptual representations that identify the image: negative emotion in image corresponds to Brexit effects. With identifying clauses, the order of the participants can be reversed, namely the image corresponds to Brexit effects, or Brexit effects correspond to the image. In the example in pattern 2, visual conceptual representations and narrative linguistic representations are combined: the mental process of sadness is conceptualized in the conceptual image and extended with a narrative representation/mental process (“Sorry to see you go after Brexit”). In the example in pattern 4, the text is composed of a narrative representation/material process (“Britain, What have you done?”) as well as of a conceptual representation/intensive process (#Brexit#disaster).

Set of patterns 3: Visual representations that support emotion

When we look at the most prevalent types of image-text combination used to express emotion, it appears that supported emotion is predominant in the visual as well as in the linguistic parts of the posts. Five types of common metaphorical images are prevalent in the corpus: fences and storms (both 10% of the corpus), sinking or falling objects (7.5%), dawns (6%) and torn British flags (4.5%). Together, they constitute over one third of the corpus. Beyond these culturally shared metaphors, it was striking to discover the high number of metaphorical images that do not belong to our collective background and that acquire a metaphorical meaning thanks to their authors’ creativity. For example, a post was composed of a picture of the façade of a building, which metaphorically represents the gentle curves which the author of the post hopes Brexit will be (Example 4).

(4)

Image 2.Example of Brexit-related metaphorical meaning of the image through text (Example 4)
Image 2.