Quietly angry, loudly happy: Self-reported customer satisfaction vs. automatically detected emotion in contact center calls

Bolo, Eric; Samoul, Muhammad; Seichepine, Nicolas; Chetouani, Mohamed

doi:10.1075/is.22038.bol

Article published In:

Vocal Interactivity in-and-between Humans, Animals and Robots
Edited by Mohamed Chetouani, Elodie F. Briefer, Angela Dassow, Ricard Marxer, Roger K. Moore, Nicolas Obin and Dan Stowell
[Interaction Studies 24:1] 2023
► pp. 168–192

Quietly angry, loudly happy

Self-reported customer satisfaction vs. automatically detected emotion in contact center calls

Eric Bolo | Batvoice AI

Muhammad Samoul | Batvoice AI

Nicolas Seichepine | Batvoice AI

Mohamed Chetouani | Batvoice AI | Sorbonne University

Phone calls are an essential communication channel in today’s contact centers, but they are more difficult to analyze than written or form-based interactions. To that end, companies have traditionally used surveys to gather feedback and gauge customer satisfaction. In this work, we study the relationship between self-reported customer satisfaction (CSAT) and automatic utterance-level indicators of emotion produced by affect recognition models, using a real dataset of contact center calls. We find (1) that positive valence is associated with higher CSAT scores, while the presence of anger is associated with lower CSAT scores; (2) that automatically detected affective events and CSAT response rate are linked, with calls containing anger/positive valence exhibiting respectively a lower/higher response rate; (3) that the dynamics of detected emotions are linked with both CSAT scores and response rate, and that emotions detected at the end of the call have a greater weight in the relationship. These findings highlight a selection bias in self-reported CSAT leading respectively to an over/under-representation of positive/negative affect.

Keywords: customer satisfaction, emotions, affective computing, real-world applications

Article outline

1.Introduction
- 1.1Contributions
2.Related work
- 2.1Emotion recognition in phone calls
- 2.2Prediction of customer satisfaction and/or service quality
- 2.3Joint analysis of emotion and customer satisfaction
3.Hypotheses
4.Materials
- 4.1Database
- 4.2Customer satisfaction
- 4.3Affective indicators
- 4.4Dynamics of affective indicators and customers’ profiles
5.Methods
- 5.1Automatic speech recognition
- 5.2Automatic emotion recognition
  - 5.2.1Data and labels
  - 5.2.2Emotion recognition model
- 5.3Data analysis
6.Results
- 6.1CSAT scoring in phone calls
- 6.2Emotion recognition
- 6.3CSAT response rate and emotions
- 6.4Satisfaction score and emotions
- 6.5CSAT response rate and emotional dynamics profiles
- 6.6Self-reported satisfaction and emotional dynamics profiles
7.Discussion
8.Conclusion and future works
Note
References

Published online: 28 August 2023

https://doi.org/10.1075/is.22038.bol

References (26)

References

Ando, A., Masumura, R., Kamiyama, H., Kobashikawa, S., and Aono, Y. (2017). Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls. In Proc. Interspeech 2017, pages 1716–1720.

Ando, A., Masumura, R., Kamiyama, H., Kobashikawa, S., Aono, Y., and Toda, T. (2020). Customer satisfaction estimation in contact center calls based on a hierarchical multi-task model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 281:715–728.

Auguste, J., Charlet, D., Damnati, G., Bechet, F., and Favre, B. (2019). Can we predict self-reported customer satisfaction from interactions? In ICASSP 2019 – 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7385–7389.

Bockhorst, J., Yu, S., Polania, L., and Fung, G. (2017). Predicting self-reported customer satisfaction of interactions with a corporate call center. In Altun, Y., Das, K., Mielikäinen, T., Malerba, D., Stefanowski, J., Read, J., Zitnik, M., Ceci, M., and Dzeroski, S., editors, Machine Learning and Knowledge Discovery in Databases, pages 179–190, Cham. Springer International Publishing. ISBN 978-3-319-71273-4.

Chowdhury, S. A., Stepanov, E. A., and Riccardi, G. (2016). Predicting user satisfaction from turn-taking in spoken conversations. In INTERSPEECH.

Deschamps-Berger, T., Lamel, L., and Devillers, L. (Sept. 2021). End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings. In 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII), Nara, Japan.

Erden, M. and Arslan, L. M. (2011). Automatic detection of anger in human-human call center dialogs. In Proc. Interspeech 2011, pages 81–84.

Eyben, F., Weninger, F., Gross, F., and Schuller, B. (2013). Recent developments in opensmile, the munich open-source multimedia feature extractor. In Proceedings of the 21st ACM International Conference on Multimedia, MM ’131, page 835–838, New York, NY, USA. Association for Computing Machinery. ISBN 9781450324045.

Galanis, D., Karabetsos, S., Koutsombogera, M., Papageorgiou, H., Esposito, A., and Riviello, M.-T. (2013). Classification of emotional speech units in call centre interactions. In 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom), pages 403–406.

Graves, A., Fernandez, S., Gomez, F., and Schmidhuber, J. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’061, page 369–376, New York, NY, USA. Association for Computing Machinery. ISBN 1595933832.

Kim, Y., Levy, J., and Liu, Y. (2020). Speech sentiment and customer satisfaction estimation in socialbot conversations. In INTERSPEECH.

Luque, J., Segura, C., Sánchez, A., Umbert, M., and Galindo, L. A. (2017). The Role of Linguistic and Prosodic Cues on the Prediction of Self-Reported Satisfaction in Contact Centre Phone Calls. In Proc. Interspeech 2017, pages 2346–2350.

Miao, Y., Gowayyed, M., and Metze, F. (2015). Eesen: End-to-end speech recognition using deep rnn models and wfst-based decoding. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 167–174.

Morrison, D., Wang, R., and De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49(2):98–112. ISSN 0167-6393.

Mower, E., Matarić, M. J., and Narayanan, S. (2011). A framework for automatic human emotion classification using emotion profiles. IEEE Transactions on Audio, Speech, and Language Processing, 19(5):1057–1070.

Petrushin, V. A. (1999). Emotion in speech: Recognition and application to call centers.

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., and Vesely, K. The kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, Dec. 2011. IEEE Catalog No.: CFP11SRW-USB.

Russell, J. A. (1980). A circumplex model of affect. Journal of personality and social psychology, 39(6):1161.

Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C. A., and Narayanan, S. S. (2010a). The interspeech 2010 paralinguistic challenge. In INTERSPEECH.

Schuller, B., Weninger, F., Zhang, Y., Ringeval, F., Batliner, A., Steidl, S., Eyben, F., Marchi, E., Vinciarelli, A., Scherer, K., Chetouani, M., and Mortillaro, M. (2019). Affective and behavioural computing: Lessons learnt from the first computational paralinguistics challenge. Computer Speech & Language, 531:156–180. ISSN 0885-2308.

Schuller, B. W., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C. A., and Narayanan, S. S. (2010b). The INTERSPEECH 2010 paralinguistic challenge. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26–30, 2010, pages 2794–2797.

Schuller, B. W., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K. R., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Mortillaro, M., Salamin, H., Polychroniou, A., Valente, F., and Kim, S. (2013). The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25–29, 2013, pages 148–152.

Segura, C., Balcells, D., Umbert, M., Arias, J., and Luque, J. (2016). Automatic speech feature learning for continuous prediction of customer satisfaction in contact center phone calls. In Abad, A., Ortega, A., Teixeira, A., Mateo, C. Garcia, Hinarejos, C. D. Martínez, Perdigão, F., Batista, F., and Mamede, N., editors, Advances in Speech and Language Technologies for Iberian Languages, pages 255–265, Cham. Springer International Publishing. ISBN 978-3-319-49169-1.

Vaudable, C. and Devillers, L. (2012). Negative emotions detection as an indicator of dialogs quality in call centers. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5109–5112.

Viikki, O. and Laurila, K. (1998). Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication, 25(1):133–147. ISSN 0167-6393.

Zweig, G., Siohan, O., Saon, G., Ramabhadran, B., Povey, D., Mangu, L., and Kingsbury, B. (2006). Automated quality monitoring for call centers using speech and nlp technologies. In Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume: Demonstrations, NAACL-Demonstrations ’061, page 292–295, USA. Association for Computational Linguistics.