Tracing semantic change in Portuguese
A distributional approach to adversative connectives
This study uses word embeddings to investigate the semantic changes underlying the creation of two adversative connectives in Portuguese, porém and mas ‘but, however’. For porém, we chart its development from an original PP formed by a preposition with a causal meaning (por) and a demonstrative pronoun that referred anaphorically to a previous proposition (en(de)). For mas, we trace its change from an adverb meaning ‘more’. Adopting a distributional semantics approach, we use word embedding models trained on two corpora, the CIPM (Corpus Informatizado do Português Medieval, containing texts from the 12th–16th centuries) and COLONIA (containing texts from the 16th–20th centuries). We produce a measure of change based on the similarity scores of porém and mas with respect to words in relevant semantic categories in each corpus, representing the source and the target meanings. This paper, which constitutes the first computational study of semantic change in Portuguese, also discusses challenges and outlines steps to be taken into consideration when choosing embedding algorithms for small historical corpora.
Article outline
- 1.Introduction
- 2.Adversative connectives in Portuguese
- 2.1
Porém
- 2.2
Mas
- 2.3Motivating the current study
- 3.Distributional semantics and word embeddings
- 3.1Choice of word embedding algorithm
- 3.2Similarity scores
- 3.3Experimental setting for this study
- 3.4Objectives of the current study
- 4.Corpora used
- 5.Results
- 5.1Semantic domains
- 5.2Similarity to semantic domains over time: Porém
- 5.3Similarity to semantic domains over time: Mas
- 5.4Comparison with a “true negative”
- 6.Discussion
- 7.Conclusion
- Notes
- Abbreviations
-
References
References (67)
References
Antoniak, Maria & David Mimno. 2018. Evaluating the Stability of Embedding-Based Word Similarities. Transactions of the Association for Computational Linguistics 61.107–119. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Asr, Fatemeh Torabi, Jon Willits & Michael N. Jones. 2016. Comparing Predictive and Co-Occurrence Based Models of Lexical Semantics Trained on Child-Directed Speech. In Proceedings of the 38th Annual Conference of the Cognitive Science Society.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bechara, Evanildo. 2009. Moderna gramática portuguesa. Rio de Janeiro: Nova Fronteira.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bybee, Joan, Revere Perkins & William Pagliuca. 1994. The Evolution of Gramar: Tense, Aspect, and Modality in the Languages of the World. Chicago: The University of Chicago Press.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Castillo Lluch, Mónica. 1993. Acercamiento a las partículas adversativas medievales. Cahiers d’Etudes Hispaniques Médiévales 18:1.219–242. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Corominas, Joan & José Antonio Pascual. 1980–1991. Diccionario crítico etimológico castellano e hispánico, Madrid: Gredos.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cuenca, Maria Josep, Sorina Postolea & Jaqueline Visconti. 2019. Contrastive Markers in Contrast. Discours 251. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ducrot, Oswald & Carlos Vogt. 1979. De magis à mais: une hypothèse sémantique. Revue de Linguistique Romane Lyon 43:171–172.317–341.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Eckardt, Regine. 2006. Meaning Change in Grammaticalization: An Enquiry into Semantic Reanalysis. Oxford: Oxford University Press. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Espinosa Elorza, Rosa María. 2007. Aspectos generales de la evolución de las expresiones adversativas: Cambios en cadena. Medievalia 391.1–30.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Espinosa Elorza, Rosa María. 2018. La formación de los marcadores sumativos en español. Desde sobresto hasta a mayores
. Estudios Humanísticos Filología 401.95–118. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Forker, Diana. 2016. Toward a Typology for Additive Markers. Lingua 1801.69–100. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hamilton, William L., Jure Leskovec & Dan Jurafsky. 2016. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), 1489–1501. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Harris, Zellig S. 1954. Distributional Structure. Word 10:2–3.146–162. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hartmann, Nathan S., Erick R. Fonseca, Christopher D. Shulby, Marcos V. Treviso, Jessica S. Rodrigues & Sandra M. Aluısio. 2017. Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks. In XI Brazilian Symposium in Information and Human Language Technology and Collocated Events.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hellrich, Johannes. 2019. Word Embeddings: Reliability and Semantic Change. PhD dissertation. Jena University.
Hellrich, Johannes, Sven Buechel & Udo Hahn. 2019. Modeling Word Emotion in Historical Language: Quantity Beats Supposed Stability in Seed Word Selection. In Proceedings of the 3rd Joint SIGHUM Workshop on Computation Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 1–11. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hofmann, Johann B. & Anton Szantyr. 1965. Lateinkche Syntax und Stilistik. Munich: Beck.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Hu, Hai, Patrícia Amaral & Sandra Kübler. 2021. Word Embeddings and Semantic Shifts in Historical Spanish: Methodological Considerations. Digital Scholarship in the Humanities. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Huber, Joseph. 1986 [1933]. Gramática do Português Antigo. Lisboa: Fundação Calouste Gulbenkian. (Translated by Maria Manuela Gouveia Delille.)![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Jurafsky, Daniel & James H. Martin. 2019. Speech and Language Processing. [URL]
König, Ekkehard. 1989. On the Historical Development of Focus Particles. Sprechen mit Partikeln ed. by Harald Weydt, 318–329. Berlin: Walter de Gruyter.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
König, Ekkehard. 1991. The Meaning of Focus Particles: A Comparative Perspective. London: Routledge.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
König, Ekkehard & Peter Siemund. 2000. Causal and Concessive Clauses: Formal and Semantic Relation. Cause, Condition, Concession, Contrast ed. by Elizabeth Couper-Kuhlen & Bernd Kortmann, 341–360. Berlin: Mouton de Gruyter. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kutuzov, Andrei, Murhaf Fares, Stephan Oepen & Erik Velldal. 2017. Word Vectors, Reuse, and Replicability: Towards a Community Repository of Large-Text Resources. In Proceedings of the 58th Conference on Simulation and Modelling, 271–276. Linköping University Electronic Press.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lenci, Alessandro. 2018. Distributional Models of Word Meaning. Annual Review of Linguistics 41.151–171. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Levy, Omer, Yoav Goldberg & Ido Dagan. 2015. Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics 31.211–225. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Machado, José Pedro. 1952. Dicionário etimológico da língua portuguesa: com a mais antiga documentação escrita e reconhecida de muitos dos vocábulos estudados. Lisboa: Editorial Confluência.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Martelotta, Mário Eduardo Toscano. 2008. Gramaticalização de conectivos portugueses: uma trajetória do espaço para o texto. Estudos Linguísticos/Linguistic Studies 21.41–60.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Martelotta, Mário Eduardo Toscano. 2011. The Conclusive Clause in Portuguese: An Approach Combining Grammaticalization Theory and Construction Grammar Theory. Letras & Letras 27.1. [URL]
Mattos e Silva, Rosa Virginia. 1984.
Pero e porém: Mudanças em curso na fase arcaica da língua portuguesa. Boletim de Filologia. Lisboa XXIX1.129–151.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mattos e Silva, Rosa Virginia. 1994. O Português Arcaico. Morfologia e sintaxe. São Paulo: Contexto.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mauri, Caterina. 2008. Coordination Relations in the Languages of Europe and Beyond. Berlin: De Gruyter. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mauri, Caterina & Anna Giacolone Ramat. 2012. The Development of Adversative Connectives in Italian: Stages and Factors at Play. Linguistics 50:2.191–239. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mazzoleni, Marco. 2015. Connettori, grammatica e testi: Ma e (ben) sì tra costrutti avversativi, sostitutivi e preconcessivi. Testualità. Fondamenti, unità, relazioni, 171–188. Florence: Franco Cesati Editore.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Meyer-Lübke, Wilhelm. 1935. Romanisches etymologisches Wörterbuch. Heidelberg: C. Winter.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Meyer-Lübke, Wilhelm. 1923. Grammaire des langues romanes. New York: Stechert. (Reprint of the 1890–1896 edition by Paris: H. Welter).![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mikolov, Tomas, Kai Chen, Greg Corrado & Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. Proceedings of ICLR, Scottsdale, AZ.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Mira Mateus, Maria Helena, Ana Maria Brito, Inês Duarte & Isabel Hub Faria. 2003. Gramática da língua portuguesa. Lisboa: Caminho.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Neves, Maria Helena de Moura. 1984. O coordenador interfrasal mas – invariância e variantes. Alfa: Revista de Linguística 281.21–42.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Nikolaeva, Irina & Maria Tolskaya. 2001. A Grammar of Udihe. Berlin: De Gruyter Mouton. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Orlandini, Anna. 2001. Négation et argumentation en latin. Leuven: Peeters Publishers.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Peres, João & Salvador Mascarenhas. 2006. Notes on Sentential Connections (Predominantly) in Portuguese. Journal of Portuguese Linguistics 51.113–169. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ramat, Anna Giacolone & Caterina Mauri. 2008. From Cause to Contrast: A Study in Semantic Change. Studies on Grammaticalization ed. by Elisabeth Verhoeven, Stavros Skopeteas, Yong-Min Shin, Yoko Nishina & Johannes Helmbrecht, 303–320. Berlin: Mouton de Gruyter.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Raposo, Eduardo Buzaglo Paiva, Maria Fernanda Bacelar do Nascimento, Maria Antónia Coelho da Mota, Luísa Segura & Amália Mendes. 2013. Gramática do Português. Lisboa: Fundação Calouste Gulbenkian.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Robinson, Laura C. 2008. Dupaningan Agta: Grammar, Vocabulary, and Texts. Ph.D. dissertation. University of Hawai’i.
Rodman, Emma. 2020. A Timely Intervention: Tracking the Changing Meanings of Political Concepts with Word Vectors. Political Analysis 28:1.87–111. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rodda, Martina A., Marco S. G. Senaldi & Alessandro Lenci. 2017. Panta Rei: Tracking Semantic Change with Distributional Semantics in Ancient Greek. IJCoL. Italian Journal of Computational Linguistics 3:3–1.11–24. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rodrigues, João, António Branco, Steven Neale & João Silva. 2016. LX-DSemVectors: Distributional Semantics Models for Portuguese. In Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science, vol 9727. ed. by João Silva, Ricardo Ribeiro, Paulo Quaresma, André Adami & António Branco, 9–27. Springer, Cham. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rodrigues, Ruan Chaves, Jéssica Rodrigues, Pedro Vitor Quinta de Castro, Nádia Felix Felipe da Silva & Anderson Soares. 2020. Portuguese Language Models and Word Embeddings: Evaluating on Semantic Similarity Tasks. In Computational Processing of the Portuguese Language. Proceedings of the 14th International Conference, PROPOR 2020, Évora, Portugal, March 2–4, 2020 ed. by Paulo Quaresma, Renata Vieira, Sandra Aluísio, Helena Moniz, Fernando Batista & Teresa Gonçalves, 1–23. Cham: Springer. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rodríguez Somolinos, Amalia. 1996. Pourtant pour autant.: Une évolution divergente. La lingüística francesa: Gramática, historia, epistemología 165–174. Grupo Andaluz de Pragmática.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rong, Xin. 2014. word2vec Parameter Learning Explained. arXiv preprint![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Sagi, Eyal, Stefan Kaufmann & Brady Clark. 2012. Tracing Semantic Change with Latent Semantic Analysis. Current Methods in Historical Semantics ed. by Kathryn Allan and Justyna A. Robinson, 161–183. Berlin: Mouton de Gruyter.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Sahlgren, Magnus & Alessandro Lenci. 2016. The Effects of Data Size and Frequency Range on Distributional Semantic Models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 975–980. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Ali, Manuel Said. 1971. Gramática Histórica da Língua Portuguesa. São Paulo: Melhoramentos.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Schlechtweg, Dominik, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky & Nina Tahmasebi. 2020. SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, 1–23.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Silva, Tatiana Mazza da. 2010. Gramaticalização de juntivos adversativos na história do Português. MA thesis. São José do Rio Preto. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
Silveira Bueno, Francisco da. 1963. Grande dicionário etimológico-prosódico da língua portuguesa. São Paulo: Edição Saraiva.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tahmasebi, Nina, Lars Borin & Adam Jatowt. 2018. Survey of Computational Approaches to Lexical Semantic Change. arXiv:1811.06278v2 [cs.CL]![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tang, Xuri. 2018. A State-of-the-Art of Semantic Change Computation. Natural Language Engineering 24:5.649–676. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Traugott, Elizabeth C. & Richard B. Dasher. 2001. Regularity in Semantic Change. Cambridge: Cambridge University Press. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Tsakalidis, Adam, Marya Bazzi, Mihai Cucuringu, Pierpaolo Basile & Barbara McGillivray. 2019. Mining the UK Web Archive for Semantic Change Detection. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), 1212–1221. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
von Fintel, Kai. 1995. “The Formal Semantics of Grammaticalization.” Proceedings of the North East Linguistics Society 25 – Volume Two: Papers from the Workshops on Language Acquisition & Language Change, Article 14.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Zampieri, Marcos, Shervin Malmasi & Mark Dras. 2016. Modeling Language Change in Historical Corpora: The Case of Portuguese. In Proceedings of Language Resources and Evaluation (LREC), 4098–4104.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cited by (1)
Cited by one other publication
JARRETT, DYLAN & PATRÍCIA AMARAL
2023.
Usage‐Based Approaches to Semantic Change. In
The Handbook of Usage‐Based Linguistics,
► pp. 435 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.