Tracing semantic change in Portuguese
A distributional approach to adversative connectives
Patrícia Amaral | Indiana University
Zuoyu Tian | Indiana University
Dylan Jarrett | Indiana University
Juan Escalona Torres | Cornell University
This study uses word embeddings to investigate the semantic changes underlying the creation of two adversative connectives in Portuguese, porém and mas ‘but, however’. For porém, we chart its development from an original PP formed by a preposition with a causal meaning (por) and a demonstrative pronoun that referred anaphorically to a previous proposition (en(de)). For mas, we trace its change from an adverb meaning ‘more’. Adopting a distributional semantics approach, we use word embedding models trained on two corpora, the CIPM (Corpus Informatizado do Português Medieval, containing texts from the 12th–16th centuries) and COLONIA (containing texts from the 16th–20th centuries). We produce a measure of change based on the similarity scores of porém and mas with respect to words in relevant semantic categories in each corpus, representing the source and the target meanings. This paper, which constitutes the first computational study of semantic change in Portuguese, also discusses challenges and outlines steps to be taken into consideration when choosing embedding algorithms for small historical corpora.
Keywords: semantic change, Portuguese, adversative connectives, distributional semantics, word embeddings
Article outline
- 1.Introduction
- 2.Adversative connectives in Portuguese
- 2.1 Porém
- 2.2 Mas
- 2.3Motivating the current study
- 3.Distributional semantics and word embeddings
- 3.1Choice of word embedding algorithm
- 3.2Similarity scores
- 3.3Experimental setting for this study
- 3.4Objectives of the current study
- 4.Corpora used
- 5.Results
- 5.1Semantic domains
- 5.2Similarity to semantic domains over time: Porém
- 5.3Similarity to semantic domains over time: Mas
- 5.4Comparison with a “true negative”
- 6.Discussion
- 7.Conclusion
- Notes
- Abbreviations
-
References
Published online: 25 April 2022
https://doi.org/10.1075/jhl.21028.ama
https://doi.org/10.1075/jhl.21028.ama
References
Antoniak, Maria & David Mimno
Asr, Fatemeh Torabi, Jon Willits & Michael N. Jones
Bybee, Joan, Revere Perkins & William Pagliuca
Castillo Lluch, Mónica
Corominas, Joan & José Antonio Pascual
Cuenca, Maria Josep, Sorina Postolea & Jaqueline Visconti
Ducrot, Oswald & Carlos Vogt
Eckardt, Regine
Espinosa Elorza, Rosa María
Hamilton, William L., Jure Leskovec & Dan Jurafsky
Hartmann, Nathan S., Erick R. Fonseca, Christopher D. Shulby, Marcos V. Treviso, Jessica S. Rodrigues & Sandra M. Aluısio
Hellrich, Johannes
2019 Word Embeddings: Reliability and Semantic Change. PhD dissertation. Jena University.
Hellrich, Johannes, Sven Buechel & Udo Hahn
Hu, Hai, Patrícia Amaral & Sandra Kübler
Huber, Joseph
Jurafsky, Daniel & James H. Martin
König, Ekkehard
König, Ekkehard & Peter Siemund
Kutuzov, Andrei, Murhaf Fares, Stephan Oepen & Erik Velldal
Lenci, Alessandro
Levy, Omer, Yoav Goldberg & Ido Dagan
Machado, José Pedro
Martelotta, Mário Eduardo Toscano
2011 The Conclusive Clause in Portuguese: An Approach Combining Grammaticalization Theory and Construction Grammar Theory. Letras & Letras 27.1. http://www.seer.ufu.br/index.php/letraseletras/article/view/25730
Mattos e Silva, Rosa Virginia
Mauri, Caterina
Mauri, Caterina & Anna Giacolone Ramat
Mazzoleni, Marco
Mikolov, Tomas, Kai Chen, Greg Corrado & Jeffrey Dean
Mira Mateus, Maria Helena, Ana Maria Brito, Inês Duarte & Isabel Hub Faria
Neves, Maria Helena de Moura
Peres, João & Salvador Mascarenhas
Ramat, Anna Giacolone & Caterina Mauri
Raposo, Eduardo Buzaglo Paiva, Maria Fernanda Bacelar do Nascimento, Maria Antónia Coelho da Mota, Luísa Segura & Amália Mendes
Robinson, Laura C.
2008 Dupaningan Agta: Grammar, Vocabulary, and Texts. Ph.D. dissertation. University of Hawai’i.
Rodman, Emma
Rodda, Martina A., Marco S. G. Senaldi & Alessandro Lenci
Rodrigues, João, António Branco, Steven Neale & João Silva
Rodrigues, Ruan Chaves, Jéssica Rodrigues, Pedro Vitor Quinta de Castro, Nádia Felix Felipe da Silva & Anderson Soares
2020 Portuguese Language Models and Word Embeddings: Evaluating on Semantic Similarity Tasks. In Computational Processing of the Portuguese Language. Proceedings of the 14th International Conference, PROPOR 2020, Évora, Portugal, March 2–4, 2020 ed. by Paulo Quaresma, Renata Vieira, Sandra Aluísio, Helena Moniz, Fernando Batista & Teresa Gonçalves, 1–23. Cham: Springer. 

Rodríguez Somolinos, Amalia
Sagi, Eyal, Stefan Kaufmann & Brady Clark
Sahlgren, Magnus & Alessandro Lenci
Schlechtweg, Dominik, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky & Nina Tahmasebi
Silva, Tatiana Mazza da
Silveira Bueno, Francisco da
Stilo, Donald
Tahmasebi, Nina, Lars Borin & Adam Jatowt
Tang, Xuri
Tosco, Mauro
Traugott, Elizabeth C. & Richard B. Dasher
Tsakalidis, Adam, Marya Bazzi, Mihai Cucuringu, Pierpaolo Basile & Barbara McGillivray
von Fintel, Kai