Automatic tool to annotate smile intensities in conversational face-to-face interactions

Rauzy, Stéphane; Amoyal, Mary

doi:10.1075/gest.22012.rau

Article published In:

Gesture
Vol. 21:2/3 (2022) ► pp.320–364

Automatic tool to annotate smile intensities in conversational face-to-face interactions

Stéphane Rauzy | Aix Marseille Université, CNRS, Laboratoire Parole et Langage UMR 7309

Mary Amoyal | Aix Marseille Université, CNRS, Laboratoire Parole et Langage UMR 7309

This study presents an automatic tool that allows to trace smile intensities along a video record of conversational face-to-face interactions. The processed output proposes a sequence of adjusted time intervals labeled following the Smiling Intensity Scale (Gironzetti, Attardo, and Pickering, 2016), a 5 levels scale varying from neutral facial expression to laughing smile. The underlying statistical model of this tool is trained on a manually annotated corpus of conversations featuring spontaneous facial expressions. This model will be detailed in this study. This tool can be used with benefits for annotating smile in interactions. The results are twofold. First, the evaluation reveals an observed agreement of 68% between manual and automatic annotations. Second, manually correcting the labels and interval boundaries of the automatic outputs reduces by a factor 10 the annotation time as compared with the time spent for manually annotating smile intensities without pretreatment. Our annotation engine makes use of the state-of-the-art toolbox OpenFace for tracking the face and for measuring the intensities of the facial Action Units of interest all along the video. The documentation and the scripts of our tool, the SMAD software, are available to download at the HMAD open source project URL page https://github.com/srauzy/HMAD (last access 31 July 2023).

Keywords: Smiling Intensity Scale, automatic annotation, OpenFace software, machine learning, Facial Action Coding System

Article outline

Introduction
The OpenFace software
- Action Unit detection
- In-the-wild videos
Manual annotations of smiles
- The Smiling Intensity Scale (SIS) of Gironzetti et al. (2006)
- Adaptation of the scale
- The CHEESE! corpus
- The gold standard
Automatic detection of smile activity
- Link between the OpenFace AUs intensities and the SIS manual annotations
- Specification of the model
- The training stage
  - Definition of the AUs composite function
  - Estimation of the conditional probability
  - Estimation of the transition probability
  - Estimation of the α parameter
  - The states probability of transition
- The best solution
- The reliability of the automatic annotation
Evaluation of the tool
- Precision, recall, f-measure and κ coefficient
- The gain in annotation time
The HMAD and SMAD softwares
Conclusions and perspectives
Notes
References

Published online: 1 September 2023

https://doi.org/10.1075/gest.22012.rau

References (76)

References

Alibali, M. W., Kita, S., & Young, A. J. (2000). Gesture and the process of speech production: We think, therefore we gesture. Language and cognitive processes, 15(6), 593–613.

Amoyal, M., & Priego-Valverde, B. (2019). Smiling for negotiating topic transitions in French conversation. Gesture and Speech in Interaction, proceedings of Gespin 2019. Sep 2019, Paderborn, Germany.

An, L., Yang, S., & Bhanu, B. (2015). Efficient smile detection by extreme learning machine. Neurocompution, 149(PA), 354–363.

Argyle, M. (1975). Bodily communication. Methuen: London.

Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Comput. Linguist., 34(4), 555–596.

Baltrušaitis, T., Mahmoud, M., & Robinson, P. (2015). Cross-dataset learning and person specific normalisation for automatic action unit detection. Proceedings of the 11th ieee international conference on automatic face and gesture recognition, 1–6. Ljubljana, Slovenia, May 2015.

Baltrušaitis, T., Robinson, P., & Morency, L.-P. (2012). 3D constrained local model for rigid and non-rigid facial tracking. Proceedings of the ieee computer society conference on computer vision and pattern recognition (CVPR 2012), pp. 2610–2617, Providence, RI, USA, June 2012.

(2013). Constrained local neural fields for robust facial landmark detection in the wild. Proceedings of the 2013 ieee international conference on computer vision workshops (pp. 354–361). Portland, OR, USA, June 2013,

(2016). OpenFace: An open source facial behavior analysis toolkit. Proceedings of the Ieee winter conference on applications of computer vision. pp. 1–10, Lake Placid, NY, USA, 2016.

Baltrušaitis, T., Zadeh, A., Lim, Y. C., & Morency, L.-P. (2018). Openface 2.0: Facial behavior analysis toolkit. In Proceedings of the 13th ieee international conference on automatic face gesture recognition (FG 2018) (pp. 59–66). Xi’an, China, 2018,

Barrier, G. (2013). La communication non verbale: Comprendre les gestes: Perception et signification. Issy-les–

Bartlett, M. S., Littlewort, G. C., Braathen, B., Sejnowski, T. J., & Movellan, J. R. (2003). A prototype for automatic recognition of spontaneous facial actions. Advances in Neural Information Processing Systems, 151, 1271–1278.

Bartlett, M. S., Littlewort, G. C., Franck, M. G., Lainscsek, C., Fasel, I. R., & Movellan, J. R. (2006). Automatic recognition of facial actions in spontaneous expressions. Journal of Multimedia, 1(6), 22–35.

Bateson, G., Winkin, Y., Bansard, D., Cardoen, A., & Birdwhistell, R. (1981). La nouvelle communication. Paris: Ed. du Seuil.

Bavelas, J. B., & Gerwing, J. (2007). Conversational Hand Gestures and Facial Displays in Face-to-Face Dialogue. In K. Fiedler (Ed.), Social communication (pp. 283–308). Psychology Press.

Brugman, H., Russel, A., & Nijmegen, X. (2004). Annotating Multi-media/Multi-modal resources with ELAN. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Language Evaluation (LREC 2004) (pp. 2065–2068). Paris: European Language Resources Association.

Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., & Sheikh, Y. A. (2019). Openpose: Realtime multi-person 2d pose estimation using part affinity fields. Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence. 43(1):172–186.

Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistic. Comput. Linguist., 22(2), 249–254. Retrieved from [URL] (last access 1 August 2023).

Chen, J., Ou, Q., Chi, Z., & Fu, H. (2017). Smile detection in the wild with deep convolutional neural networks. Machine Vision and Applications, 28(1), 173–183.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1), 37–46.

Cohn, J. F., & De la Torre, F. (2014). Automated face analysis for affective computing. In R. Calvo, S. D’Mello, J. Gratch, & A. Kappas (Eds.), The Oxford handbook of affective computing (pp. 131–150). Oxford: Oxford University Press.

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal Of The Royal Statistical Society, Series B, 39(1), 1–38.

Dhall, A., Goecke, R., Gedeon, T., & Sebe, N. (2016). Emotion recognition in the wild. Journal on Multimodal User Interfaces, 10(2), 95–97.

Ekman, P., Friesen, W., & Hager, J. (2002). Facial action coding system: Research nexus. Salt Lake City, UT: Network Research Information.

Ekman, P. (1984). Expression and the nature of emotion. In K. Scherer & P. Ekman (Eds.), Approaches to Emotion (pp. 319–344). Hillsdale, NJ: Lawrence Erlbaum.

Ekman, P., Davidson, R. J., & Friesen, W. V. (1990). The Duchenne smile: Emotional expression and brain physiology: II. Journal of Personality and Social Psychology, 58 (2), 342–353.

Ekman, P., & Friesen, W. (1975). Unmasking the face : A guide to recognizing emotions from facial clues. Englewood Cliffs: Prentice-hall.

Ekman, P., & Friesen, W. V. (1978). Facial action coding system: Manual. Palo Alto, CA, USA: Consulting Psychologists Press.

El Haddad, K., Chakravarthula, S. N., & Kennedy, J. (2019). Smile and laugh dynamics in naturalistic dyadic interactions: Intensity levels, sequences and roles. Proceedings of the ACM international conference on multimodal interaction (pp. 259–263). Suzhou, Jiangsu, China. October 2019.

Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.

Forney, G. D. (1973). The Viterbi algorithm. Proceedings of IEEE, 61(3), 268–278.

Freire-Obregón, D., & Castrillón-Santana, M. (2015). An evolutive approach for smile recognition in video sequences. International Journal of Pattern Recognition and Artificial Intelligence, 29(01), 17 pages.

Girard, J. M., Cohn, J. F., & De la Torre, F. (2015). Estimating smile intensity: A better way. Pattern Recognition Letters, 661, 13–21.

Gironzetti, E., Attardo, S., & Pickering, L. (2016). Smiling, gaze, and humor in conversation: A pilot study. In L. Ruiz-Gurillo (Ed.), Metapragmatics of humor: Current research trends (pp. 235–254). This is a contribution from Metapragmatics of Humor. Current research trends. 2016: John Benjamins Publishing Company.

Gorisch, J., & Prévot, L. (2014). Aix-DVD, LPL. Retrieved from [URL] (last access 1 August 2023).

Goujon, A., Bertrand, R., & Tellier, M. (2015). Eyebrows in French talk-in-interaction. In Proceedings of the Gesture and speech in interaction – 4th edition (Gespin 4) (pp. 125–130). Nantes, France.

Guo, X., Polania, L., & Barner, K. (2018). Smile detection in the wild based on transfer learning. Proceedings of the 13th ieee international conference on automatic face gesture recognition (FG 2018) (pp. 679–686). Xi’an, China, 2018,

Hanna, J. E., & Brennan, S. E. (2007). Speakers’ eye gaze disambiguates referring expressions early during face-to-face conversation. Journal of Memory and Language, 57(4), 596–615.

Harker, L., & Keltner, D. (2001). Expressions of positive emotion in women’s college yearbook pictures and their relationship to personality and life outcomes across adulthood. Journal of Personality and Social Psychology, 80(1), 112–124.

Heerey, E. A., & Crossley, H. M. (2013). Predictive and reactive mechanisms in smile reciprocity. Psychological Science, 24(8), 1446–1455.

Holler, J., Schubotz, L., Kelly, S., Hagoort, P., Schuetze, M., & Özyürek, A. (2014). Social eye gaze modulates processing of speech and co-speech gesture. Cognition, 133(3), 692–697.

Jensen, M. H. (2015). Smile as feedback expressions in interpersonal interaction. International Journal of Psychological Studies, 7(4), 95–105.

Jiang, H., Coskun, M., Badokhon, A., Liu, M., & Huang, M.-C. (2019). Hidden smile correlation discovery across subjects using random walk with restart. IEEE Transactions on Affective Computing, 10(1), 76–84.

Kendon, A. (1967). Some functions of gaze-direction in social interaction. Acta psychologica, 261, 22–63.

(2004). Gesture: Visible action as utterance. Cambridge: Cambridge University Press.

Kent, A., Berry, M. M., Luehrs Jr., F. U., & Perry, J. W. (1955). Machine literature searching VIII. Operational criteria for designing information retrieval systems. American Documentation, 6(2), 93–101.

Kerbrat-Orecchioni, C., & Cosnier, J. (1987). Décrire la conversation. Lyon: Presses universitaires de Lyon.

Kowdiki, M., & Khaparde, A. (2021). Automatic hand gesture recognition using hybrid meta-heuristic-based feature selection and classification with dynamic time warping. Computer Science Review, 39(2), 2–16.

Krippendorff, K. (2008). Systematic and random disagreement and the reliability of nominal data. Communication Methods and Measures, 2(4), 323–338.

Krumhuber, E. G., Likowski, K. U., & Weyers, P. (2014). Facial mimicry of spontaneous and deliberate Duchenne and Non-Duchenne smiles. Journal of Nonverbal Behavior, 381, 1–11.

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–74.

Martinez, B., Valstar, M., Jiang, B., & Pantic, M. (2019). Automatic analysis of facial actions: A survey. IEEE Transactions on Affective Computing, 10(3), 325–347.

McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press.

(2012). How language began: Gesture and speech in human evolution. Cambridge: Cambridge University Press.

Powers, D. M. W. (2011). Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. Journal of Machine Learning Technologies, 2(1), 37–63.

(2012). The problem with kappa. Proceedings of the 13th conference of the European chapter of the association for computational linguistics (EACL’2012) (pp. 345–355). Publishers: Association for Computational Linguistics. Avignon, France, April 2012. Retrieved from [URL] (last access 1 August 2023).

Priego-Valverde, B., Bigi, B., Attardo, S., Pickering, L., & Gironzetti, E. (2018). Is smiling during humor so obvious? A cross-cultural comparison of smiling behavior in humorous sequences in American English and French interactions. Intercultural Pragmatics. Published by De Gruyter Mouton October 31, 2018. Retrieved from [URL] (last access 1 August 2023).

R Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. Retrieved from [URL] (last access 1 August 2023).

Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

Rauzy, S., & Goujon, A. (2018). Automatic annotation of facial actions from a video record: The case of eyebrows raising and frowning. Proceedings of the workshop on “Affects, Compagnons Artificiels et Interactions”, WACAI 2018 (7 pages). Ed. Magalie Ochs. Porquerolles, France, June 2018. Retrieved from [URL] (last access 1 August 2023).

RStudio Team. (2015). RStudio: Integrated development environment for r. RStudio, Inc. Boston, MA. Retrieved from [URL] (last access 1 August 2023).

Sacks, H., Schegloff, E., & Jefferson, G. (1974). A Simplest Systematics for the Organization of Turn Taking in Conversation. Language, 501, 696–735.

Sanders, A. F. (2013). Elements of human performance: Reaction processes and attention in human skill. Psychology Press, London, England, United Kingdom.

Schneider, P., Memmesheimer, R., Kramer, I., & Paulus, D. (2019). Gesture recognition in rgb videos using human body keypoints and dynamic time warping. In S. Chalup, T. Niemueller, J. Suthakorn, & M.-A. Williams (Eds.), Robocup 2019: Robot world cup xxiii (pp. 281–293). Cham: Springer International Publishing.

Seder, J. P., & Oishi, S. (2012). Intensity of smiling in Facebook photos predicts future life satisfaction. Social Psychological and Personality Science, 3(4), 407–413.

Seger, R. A., Wanderley, M. M., & Koerich, A. L. (2014). Automatic detection of musicians’ ancillary gestures based on video analysis. Expert Systems with Applications, 41(4, Part 2), 2098–2106.

Shan, C. (2012). Smile detection by boosting pixel differences. IEEE Trans. Image Processing, 21(1), 431–436.

Shimada, K., Matsukawa, T., Noguchi, Y., & Kurita, T. (2010). Appearance-based smile intensity estimation by cascaded support vector machines. Proceedings of the Asian conference on computer vision workshops (pp. 277–286). Queenstown, New Zealand, November 2010.

Sim, J., & Wright, C. C. (2005). The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, 85(3), 257–268.

Sloetjes, H., & Wittenburg, P. (2008). Annotation by category – ELAN and ISO DCR. Proceedings of the 6th international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, May 2008. Retrieved from [URL] (last access 1 August 2023).

Vettin, J., & Todt, D. (2004). Laughter in conversation: Features of occurrence and acoustic structure. Journal of Nonverbal Behavior, 28(2), 93–115.

Vinola, C., & Vimala Devi, K. (2019). Smile intensity recognition in real time videos: Fuzzy system approach. Multimedia Tools and Applications, 78(11), 15033–15052.

Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.

Walecki, R., Rudovic, O., Pavlovic, V., & Pantic, M. (2019). Copula ordinal regression framework for joint estimation of facial action unit intensity. IEEE Transactions on Affective Computing, 10(3), 297–312.

Whitehill, J., Littlewort, G., Fasel, I. R., Bartlett, M. S., & Movellan, J. R. (2009). Toward practical smile detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 311, 2106–2111.

Zhang, K., Huang, Y., Wu, H., & Wang, L. (2015). Facial smile detection based on deep learning features. 2015. Proceedings of the 3rd Asian Conference on Pattern Recognition (ACPR 2015), 534–538. Kuala Lumpur, Malaysia, November 2015.