Automatic tool to annotate smile intensities in conversational face-to-face interactions
This study presents an automatic tool that allows to trace smile intensities along a video record of
conversational face-to-face interactions. The processed output proposes a sequence of adjusted time intervals labeled following
the
Smiling Intensity Scale (
Gironzetti, Attardo, and Pickering,
2016), a 5 levels scale varying from neutral facial expression to laughing smile. The underlying statistical model of this
tool is trained on a manually annotated corpus of conversations featuring spontaneous facial expressions. This model will be
detailed in this study. This tool can be used with benefits for annotating smile in interactions. The results are twofold. First,
the evaluation reveals an observed agreement of 68% between manual and automatic annotations. Second, manually correcting the
labels and interval boundaries of the automatic outputs reduces by a factor 10 the annotation time as compared with the time spent
for manually annotating smile intensities without pretreatment. Our annotation engine makes use of the state-of-the-art toolbox
OpenFace for tracking the face and for measuring the intensities of the facial Action Units of interest all along the video. The
documentation and the scripts of our tool, the SMAD software, are available to download at the HMAD open source project URL page
https://github.com/srauzy/HMAD (last access 31 July 2023).
Article outline
- Introduction
- The OpenFace software
- Action Unit detection
- In-the-wild videos
- Manual annotations of smiles
- The Smiling Intensity Scale (SIS) of Gironzetti et al. (2006)
- Adaptation of the scale
- The CHEESE! corpus
- The gold standard
- Automatic detection of smile activity
- Link between the OpenFace AUs intensities and the SIS manual annotations
- Specification of the model
- The training stage
- Definition of the AUs composite function
- Estimation of the conditional probability
- Estimation of the transition probability
- Estimation of the α parameter
- The states probability of transition
- The best solution
- The reliability of the automatic annotation
- Evaluation of the tool
- Precision, recall, f-measure and κ coefficient
- The gain in annotation time
- The HMAD and SMAD softwares
- Conclusions and perspectives
- Notes
-
References
References (76)
References
Alibali, M. W., Kita, S., & Young, A. J. (2000). Gesture
and the process of speech production: We think, therefore we gesture. Language and cognitive
processes, 15(6), 593–613.
Amoyal, M., & Priego-Valverde, B. (2019). Smiling
for negotiating topic transitions in French conversation. Gesture and Speech in Interaction,
proceedings of Gespin 2019. Sep 2019, Paderborn, Germany.
An, L., Yang, S., & Bhanu, B. (2015). Efficient
smile detection by extreme learning
machine. Neurocompution, 149(PA), 354–363.
Argyle, M. (1975). Bodily
communication. Methuen: London.
Artstein, R., & Poesio, M. (2008). Inter-coder
agreement for computational linguistics. Comput.
Linguist., 34(4), 555–596.
Baltrušaitis, T., Mahmoud, M., & Robinson, P. (2015). Cross-dataset
learning and person specific normalisation for automatic action unit detection. Proceedings of
the 11th ieee international conference on automatic face and gesture
recognition, 1–6. Ljubljana, Slovenia, May 2015.
Baltrušaitis, T., Robinson, P., & Morency, L.-P. (2012). 3D
constrained local model for rigid and non-rigid facial tracking. Proceedings of the ieee
computer society conference on computer vision and pattern recognition (CVPR
2012), pp. 2610–2617, Providence, RI, USA, June 2012.
Baltrušaitis, T., Robinson, P., & Morency, L.-P. (2013). Constrained
local neural fields for robust facial landmark detection in the wild. Proceedings of the 2013
ieee international conference on computer vision
workshops (pp. 354–361). Portland, OR, USA, June 2013,
Baltrušaitis, T., Robinson, P., & Morency, L.-P. (2016). OpenFace:
An open source facial behavior analysis toolkit. Proceedings of the Ieee winter conference on
applications of computer
vision. pp. 1–10, Lake Placid, NY, USA, 2016.
Baltrušaitis, T., Zadeh, A., Lim, Y. C., & Morency, L.-P. (2018). Openface
2.0: Facial behavior analysis toolkit. In Proceedings of the 13th
ieee international conference on automatic face gesture recognition (FG
2018) (pp. 59–66). Xi’an, China, 2018,
Barrier, G. (2013). La
communication non verbale: Comprendre les gestes: Perception et
signification. Issy-les–
Bartlett, M. S., Littlewort, G. C., Braathen, B., Sejnowski, T. J., & Movellan, J. R. (2003). A
prototype for automatic recognition of spontaneous facial actions. Advances in Neural
Information Processing
Systems, 151, 1271–1278.
Bartlett, M. S., Littlewort, G. C., Franck, M. G., Lainscsek, C., Fasel, I. R., & Movellan, J. R. (2006). Automatic
recognition of facial actions in spontaneous expressions. Journal of
Multimedia, 1(6), 22–35.
Bateson, G., Winkin, Y., Bansard, D., Cardoen, A., & Birdwhistell, R. (1981). La
nouvelle communication. Paris: Ed. du Seuil.
Bavelas, J. B., & Gerwing, J. (2007). Conversational
Hand Gestures and Facial Displays in Face-to-Face Dialogue. In K. Fiedler (Ed.), Social
communication (pp. 283–308). Psychology Press.
Brugman, H., Russel, A., & Nijmegen, X. (2004). Annotating
Multi-media/Multi-modal resources with ELAN. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings
of the 4th International Conference on Language Resources and Language Evaluation (LREC
2004) (pp. 2065–2068). Paris: European Language Resources Association.
Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., & Sheikh, Y. A. (2019). Openpose:
Realtime multi-person 2d pose estimation using part affinity fields. Proceedings of the IEEE
Transactions on Pattern Analysis and Machine
Intelligence. 43(1):172–186.
Carletta, J. (1996). Assessing
agreement on classification tasks: The kappa statistic. Comput.
Linguist., 22(2), 249–254. Retrieved
from [URL] (last access 1 August 2023).
Chen, J., Ou, Q., Chi, Z., & Fu, H. (2017). Smile
detection in the wild with deep convolutional neural networks. Machine Vision and
Applications, 28(1), 173–183.
Cohen, J. (1960). A
coefficient of agreement for nominal scales. Educational and psychological
measurement, 20(1), 37–46.
Cohn, J. F., & De la Torre, F. (2014). Automated
face analysis for affective computing. In R. Calvo, S. D’Mello, J. Gratch, & A. Kappas (Eds.), The
Oxford handbook of affective
computing (pp. 131–150). Oxford: Oxford University Press.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum
likelihood from incomplete data via the EM algorithm. Journal Of The Royal Statistical Society,
Series
B, 39(1), 1–38.
Dhall, A., Goecke, R., Gedeon, T., & Sebe, N. (2016). Emotion
recognition in the wild. Journal on Multimodal User
Interfaces, 10(2), 95–97.
Ekman, P., Friesen, W., & Hager, J. (2002). Facial
action coding system: Research nexus. Salt Lake City, UT: Network Research Information.
Ekman, P. (1984). Expression
and the nature of emotion. In K. Scherer & P. Ekman (Eds.), Approaches
to
Emotion (pp. 319–344). Hillsdale, NJ: Lawrence Erlbaum.
Ekman, P., Davidson, R. J., & Friesen, W. V. (1990). The
Duchenne smile: Emotional expression and brain physiology: II. Journal of Personality and
Social
Psychology,
58
(2), 342–353.
Ekman, P., & Friesen, W. (1975). Unmasking
the face : A guide to recognizing emotions from facial clues. Englewood Cliffs: Prentice-hall.
Ekman, P., & Friesen, W. V. (1978). Facial
action coding system: Manual. Palo Alto, CA, USA: Consulting Psychologists Press.
El Haddad, K., Chakravarthula, S. N., & Kennedy, J. (2019). Smile
and laugh dynamics in naturalistic dyadic interactions: Intensity levels, sequences and
roles. Proceedings of the ACM international conference on multimodal
interaction (pp. 259–263). Suzhou, Jiangsu, China. October 2019.
Fleiss, J. L. (1971). Measuring
nominal scale agreement among many raters. Psychological
Bulletin, 76(5), 378–382.
Forney, G. D. (1973). The
Viterbi algorithm. Proceedings of
IEEE, 61(3), 268–278.
Freire-Obregón, D., & Castrillón-Santana, M. (2015). An
evolutive approach for smile recognition in video sequences. International Journal of Pattern
Recognition and Artificial
Intelligence, 29(01), 17 pages.
Girard, J. M., Cohn, J. F., & De la Torre, F. (2015). Estimating
smile intensity: A better way. Pattern Recognition
Letters, 661, 13–21.
Gironzetti, E., Attardo, S., & Pickering, L. (2016). Smiling,
gaze, and humor in conversation: A pilot study. In L. Ruiz-Gurillo (Ed.), Metapragmatics
of humor: Current research
trends (pp. 235–254). This is a
contribution from Metapragmatics of Humor. Current research trends. 2016: John Benjamins Publishing Company.
Gorisch, J., & Prévot, L. (2014). Aix-DVD,
LPL. Retrieved from [URL] (last
access 1 August
2023).
Goujon, A., Bertrand, R., & Tellier, M. (2015). Eyebrows
in French talk-in-interaction. In Proceedings of the Gesture and
speech in interaction – 4th edition (Gespin
4) (pp. 125–130). Nantes, France.
Guo, X., Polania, L., & Barner, K. (2018). Smile
detection in the wild based on transfer learning. Proceedings of the 13th ieee international
conference on automatic face gesture recognition (FG
2018) (pp. 679–686). Xi’an, China, 2018,
Hanna, J. E., & Brennan, S. E. (2007). Speakers’
eye gaze disambiguates referring expressions early during face-to-face conversation. Journal of
Memory and
Language, 57(4), 596–615.
Harker, L., & Keltner, D. (2001). Expressions
of positive emotion in women’s college yearbook pictures and their relationship to personality and life outcomes across
adulthood. Journal of Personality and Social
Psychology, 80(1), 112–124.
Heerey, E. A., & Crossley, H. M. (2013). Predictive
and reactive mechanisms in smile reciprocity. Psychological
Science, 24(8), 1446–1455.
Holler, J., Schubotz, L., Kelly, S., Hagoort, P., Schuetze, M., & Özyürek, A. (2014). Social
eye gaze modulates processing of speech and co-speech
gesture. Cognition, 133(3), 692–697.
Jensen, M. H. (2015). Smile
as feedback expressions in interpersonal interaction. International Journal of Psychological
Studies, 7(4), 95–105.
Jiang, H., Coskun, M., Badokhon, A., Liu, M., & Huang, M.-C. (2019). Hidden
smile correlation discovery across subjects using random walk with restart. IEEE Transactions
on Affective
Computing, 10(1), 76–84.
Kendon, A. (1967). Some
functions of gaze-direction in social interaction. Acta
psychologica, 261, 22–63.
Kendon, A. (2004). Gesture:
Visible action as utterance. Cambridge: Cambridge University Press.
Kent, A., Berry, M. M., Luehrs Jr., F. U., & Perry, J. W. (1955). Machine
literature searching VIII. Operational criteria for designing information retrieval
systems. American
Documentation, 6(2), 93–101.
Kerbrat-Orecchioni, C., & Cosnier, J. (1987). Décrire
la conversation. Lyon: Presses universitaires de Lyon.
Kowdiki, M., & Khaparde, A. (2021). Automatic
hand gesture recognition using hybrid meta-heuristic-based feature selection and classification with dynamic time
warping. Computer Science
Review, 39(2), 2–16.
Krippendorff, K. (2008). Systematic
and random disagreement and the reliability of nominal data. Communication Methods and
Measures, 2(4), 323–338.
Krumhuber, E. G., Likowski, K. U., & Weyers, P. (2014). Facial
mimicry of spontaneous and deliberate Duchenne and Non-Duchenne smiles. Journal of Nonverbal
Behavior, 381, 1–11.
Landis, J. R., & Koch, G. G. (1977). The
measurement of observer agreement for categorical
data. Biometrics, 33(1), 159–74.
Martinez, B., Valstar, M., Jiang, B., & Pantic, M. (2019). Automatic
analysis of facial actions: A survey. IEEE Transactions on Affective
Computing, 10(3), 325–347.
McNeill, D. (1992). Hand
and mind: What gestures reveal about
thought. Chicago: University of Chicago Press.
McNeill, D. (2012). How
language began: Gesture and speech in human
evolution. Cambridge: Cambridge University Press.
Powers, D. M. W. (2011). Evaluation:
From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. Journal
of Machine Learning
Technologies, 2(1), 37–63.
Powers, D. M. W. (2012). The
problem with kappa. Proceedings of the 13th conference of the European chapter of the
association for computational linguistics
(EACL’2012) (pp. 345–355). Publishers: Association for Computational Linguistics. Avignon, France, April 2012. Retrieved
from [URL] (last access 1 August 2023).
Priego-Valverde, B., Bigi, B., Attardo, S., Pickering, L., & Gironzetti, E. (2018). Is
smiling during humor so obvious? A cross-cultural comparison of smiling behavior in humorous sequences in American English and
French interactions. Intercultural Pragmatics. Published
by De Gruyter Mouton October 31, 2018. Retrieved from [URL] (last access 1 August 2023).
R Core Team. (2016). R: A language and
environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. Retrieved
from [URL] (last
access 1 August
2023).
Rabiner, L. R. (1989). A
tutorial on hidden Markov models and selected applications in speech recognition. Proceedings
of the
IEEE, 77(2), 257–286.
Rauzy, S., & Goujon, A. (2018). Automatic
annotation of facial actions from a video record: The case of eyebrows raising and
frowning. Proceedings of the workshop on “Affects, Compagnons Artificiels et Interactions”,
WACAI 2018 (7 pages). Ed. Magalie Ochs. Porquerolles, France, June 2018. Retrieved
from [URL] (last access 1 August 2023).
RStudio Team. (2015). RStudio: Integrated
development environment for r. RStudio, Inc. Boston, MA. Retrieved from [URL] (last access 1 August
2023).
Sacks, H., Schegloff, E., & Jefferson, G. (1974). A
Simplest Systematics for the Organization of Turn Taking in
Conversation. Language, 501, 696–735.
Sanders, A. F. (2013). Elements
of human performance: Reaction processes and attention in human skill. Psychology Press, London, England, United Kingdom.
Schneider, P., Memmesheimer, R., Kramer, I., & Paulus, D. (2019). Gesture
recognition in rgb videos using human body keypoints and dynamic time
warping. In S. Chalup, T. Niemueller, J. Suthakorn, & M.-A. Williams (Eds.), Robocup
2019: Robot world cup
xxiii (pp. 281–293). Cham: Springer International Publishing.
Seder, J. P., & Oishi, S. (2012). Intensity
of smiling in Facebook photos predicts future life satisfaction. Social Psychological and
Personality
Science, 3(4), 407–413.
Seger, R. A., Wanderley, M. M., & Koerich, A. L. (2014). Automatic
detection of musicians’ ancillary gestures based on video analysis. Expert Systems with
Applications, 41(4, Part
2), 2098–2106.
Shan, C. (2012). Smile
detection by boosting pixel differences. IEEE Trans. Image
Processing, 21(1), 431–436.
Shimada, K., Matsukawa, T., Noguchi, Y., & Kurita, T. (2010). Appearance-based
smile intensity estimation by cascaded support vector machines. Proceedings of the Asian
conference on computer vision
workshops (pp. 277–286). Queenstown, New Zealand, November 2010.
Sim, J., & Wright, C. C. (2005). The
Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size
Requirements. Physical
Therapy, 85(3), 257–268.
Sloetjes, H., & Wittenburg, P. (2008). Annotation
by category – ELAN and ISO DCR. Proceedings of the 6th international conference on language
resources and evaluation (LREC 2008). Marrakech, Morocco, May 2008. Retrieved
from [URL] (last access 1 August 2023).
Vettin, J., & Todt, D. (2004). Laughter
in conversation: Features of occurrence and acoustic structure. Journal of Nonverbal
Behavior, 28(2), 93–115.
Vinola, C., & Vimala Devi, K. (2019). Smile
intensity recognition in real time videos: Fuzzy system approach. Multimedia Tools and
Applications, 78(11), 15033–15052.
Viterbi, A. J. (1967). Error
bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE
Transactions on Information
Theory, 13(2), 260–269.
Walecki, R., Rudovic, O., Pavlovic, V., & Pantic, M. (2019). Copula
ordinal regression framework for joint estimation of facial action unit intensity. IEEE
Transactions on Affective
Computing, 10(3), 297–312.
Whitehill, J., Littlewort, G., Fasel, I. R., Bartlett, M. S., & Movellan, J. R. (2009). Toward
practical smile detection. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 311, 2106–2111.
Zhang, K., Huang, Y., Wu, H., & Wang, L. (2015). Facial
smile detection based on deep learning features. 2015. Proceedings of the 3rd Asian Conference
on Pattern Recognition (ACPR 2015), 534–538. Kuala Lumpur, Malaysia, November 2015.