Tracing semantic change in Portuguese
A distributional approach to adversative connectives
This study uses word embeddings to investigate the semantic changes underlying the creation of two adversative connectives in Portuguese, porém and mas ‘but, however’. For porém, we chart its development from an original PP formed by a preposition with a causal meaning (por) and a demonstrative pronoun that referred anaphorically to a previous proposition (en(de)). For mas, we trace its change from an adverb meaning ‘more’. Adopting a distributional semantics approach, we use word embedding models trained on two corpora, the CIPM (Corpus Informatizado do Português Medieval, containing texts from the 12th–16th centuries) and COLONIA (containing texts from the 16th–20th centuries). We produce a measure of change based on the similarity scores of porém and mas with respect to words in relevant semantic categories in each corpus, representing the source and the target meanings. This paper, which constitutes the first computational study of semantic change in Portuguese, also discusses challenges and outlines steps to be taken into consideration when choosing embedding algorithms for small historical corpora.
Article outline
- 1.Introduction
- 2.Adversative connectives in Portuguese
- 2.1
Porém
- 2.2
Mas
- 2.3Motivating the current study
- 3.Distributional semantics and word embeddings
- 3.1Choice of word embedding algorithm
- 3.2Similarity scores
- 3.3Experimental setting for this study
- 3.4Objectives of the current study
- 4.Corpora used
- 5.Results
- 5.1Semantic domains
- 5.2Similarity to semantic domains over time: Porém
- 5.3Similarity to semantic domains over time: Mas
- 5.4Comparison with a “true negative”
- 6.Discussion
- 7.Conclusion
- Notes
- Abbreviations
-
References
References (67)
References
Antoniak, Maria & David Mimno. 2018. Evaluating the Stability of Embedding-Based Word Similarities. Transactions of the Association for Computational Linguistics 61.107–119.
Asr, Fatemeh Torabi, Jon Willits & Michael N. Jones. 2016. Comparing Predictive and Co-Occurrence Based Models of Lexical Semantics Trained on Child-Directed Speech. In Proceedings of the 38th Annual Conference of the Cognitive Science Society.
Bechara, Evanildo. 2009. Moderna gramática portuguesa. Rio de Janeiro: Nova Fronteira.
Bybee, Joan, Revere Perkins & William Pagliuca. 1994. The Evolution of Gramar: Tense, Aspect, and Modality in the Languages of the World. Chicago: The University of Chicago Press.
Castillo Lluch, Mónica. 1993. Acercamiento a las partículas adversativas medievales. Cahiers d’Etudes Hispaniques Médiévales 18:1.219–242.
Corominas, Joan & José Antonio Pascual. 1980–1991. Diccionario crítico etimológico castellano e hispánico, Madrid: Gredos.
Cuenca, Maria Josep, Sorina Postolea & Jaqueline Visconti. 2019. Contrastive Markers in Contrast. Discours 251.
Ducrot, Oswald & Carlos Vogt. 1979. De magis à mais: une hypothèse sémantique. Revue de Linguistique Romane Lyon 43:171–172.317–341.
Eckardt, Regine. 2006. Meaning Change in Grammaticalization: An Enquiry into Semantic Reanalysis. Oxford: Oxford University Press.
Espinosa Elorza, Rosa María. 2007. Aspectos generales de la evolución de las expresiones adversativas: Cambios en cadena. Medievalia 391.1–30.
Espinosa Elorza, Rosa María. 2018. La formación de los marcadores sumativos en español. Desde sobresto hasta a mayores
. Estudios Humanísticos Filología 401.95–118.
Forker, Diana. 2016. Toward a Typology for Additive Markers. Lingua 1801.69–100.
Hamilton, William L., Jure Leskovec & Dan Jurafsky. 2016. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), 1489–1501.
Harris, Zellig S. 1954. Distributional Structure. Word 10:2–3.146–162.
Hartmann, Nathan S., Erick R. Fonseca, Christopher D. Shulby, Marcos V. Treviso, Jessica S. Rodrigues & Sandra M. Aluısio. 2017. Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks. In XI Brazilian Symposium in Information and Human Language Technology and Collocated Events.
Hellrich, Johannes. 2019. Word Embeddings: Reliability and Semantic Change. PhD dissertation. Jena University.
Hellrich, Johannes, Sven Buechel & Udo Hahn. 2019. Modeling Word Emotion in Historical Language: Quantity Beats Supposed Stability in Seed Word Selection. In Proceedings of the 3rd Joint SIGHUM Workshop on Computation Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 1–11.
Hofmann, Johann B. & Anton Szantyr. 1965. Lateinkche Syntax und Stilistik. Munich: Beck.
Hu, Hai, Patrícia Amaral & Sandra Kübler. 2021. Word Embeddings and Semantic Shifts in Historical Spanish: Methodological Considerations. Digital Scholarship in the Humanities.
Huber, Joseph. 1986 [1933]. Gramática do Português Antigo. Lisboa: Fundação Calouste Gulbenkian. (Translated by Maria Manuela Gouveia Delille.)
Jurafsky, Daniel & James H. Martin. 2019. Speech and Language Processing. [URL]
König, Ekkehard. 1989. On the Historical Development of Focus Particles. Sprechen mit Partikeln ed. by Harald Weydt, 318–329. Berlin: Walter de Gruyter.
König, Ekkehard. 1991. The Meaning of Focus Particles: A Comparative Perspective. London: Routledge.
König, Ekkehard & Peter Siemund. 2000. Causal and Concessive Clauses: Formal and Semantic Relation. Cause, Condition, Concession, Contrast ed. by Elizabeth Couper-Kuhlen & Bernd Kortmann, 341–360. Berlin: Mouton de Gruyter.
Kutuzov, Andrei, Murhaf Fares, Stephan Oepen & Erik Velldal. 2017. Word Vectors, Reuse, and Replicability: Towards a Community Repository of Large-Text Resources. In Proceedings of the 58th Conference on Simulation and Modelling, 271–276. Linköping University Electronic Press.
Lenci, Alessandro. 2018. Distributional Models of Word Meaning. Annual Review of Linguistics 41.151–171.
Levy, Omer, Yoav Goldberg & Ido Dagan. 2015. Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics 31.211–225.
Machado, José Pedro. 1952. Dicionário etimológico da língua portuguesa: com a mais antiga documentação escrita e reconhecida de muitos dos vocábulos estudados. Lisboa: Editorial Confluência.
Martelotta, Mário Eduardo Toscano. 2008. Gramaticalização de conectivos portugueses: uma trajetória do espaço para o texto. Estudos Linguísticos/Linguistic Studies 21.41–60.
Martelotta, Mário Eduardo Toscano. 2011. The Conclusive Clause in Portuguese: An Approach Combining Grammaticalization Theory and Construction Grammar Theory. Letras & Letras 27.1. [URL]
Mattos e Silva, Rosa Virginia. 1984.
Pero e porém: Mudanças em curso na fase arcaica da língua portuguesa. Boletim de Filologia. Lisboa XXIX1.129–151.
Mattos e Silva, Rosa Virginia. 1994. O Português Arcaico. Morfologia e sintaxe. São Paulo: Contexto.
Mauri, Caterina. 2008. Coordination Relations in the Languages of Europe and Beyond. Berlin: De Gruyter.
Mauri, Caterina & Anna Giacolone Ramat. 2012. The Development of Adversative Connectives in Italian: Stages and Factors at Play. Linguistics 50:2.191–239.
Mazzoleni, Marco. 2015. Connettori, grammatica e testi: Ma e (ben) sì tra costrutti avversativi, sostitutivi e preconcessivi. Testualità. Fondamenti, unità, relazioni, 171–188. Florence: Franco Cesati Editore.
Meyer-Lübke, Wilhelm. 1935. Romanisches etymologisches Wörterbuch. Heidelberg: C. Winter.
Meyer-Lübke, Wilhelm. 1923. Grammaire des langues romanes. New York: Stechert. (Reprint of the 1890–1896 edition by Paris: H. Welter).
Mikolov, Tomas, Kai Chen, Greg Corrado & Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. Proceedings of ICLR, Scottsdale, AZ.
Mira Mateus, Maria Helena, Ana Maria Brito, Inês Duarte & Isabel Hub Faria. 2003. Gramática da língua portuguesa. Lisboa: Caminho.
Neves, Maria Helena de Moura. 1984. O coordenador interfrasal mas – invariância e variantes. Alfa: Revista de Linguística 281.21–42.
Nikolaeva, Irina & Maria Tolskaya. 2001. A Grammar of Udihe. Berlin: De Gruyter Mouton.
Orlandini, Anna. 2001. Négation et argumentation en latin. Leuven: Peeters Publishers.
Peres, João & Salvador Mascarenhas. 2006. Notes on Sentential Connections (Predominantly) in Portuguese. Journal of Portuguese Linguistics 51.113–169.
Ramat, Anna Giacolone & Caterina Mauri. 2008. From Cause to Contrast: A Study in Semantic Change. Studies on Grammaticalization ed. by Elisabeth Verhoeven, Stavros Skopeteas, Yong-Min Shin, Yoko Nishina & Johannes Helmbrecht, 303–320. Berlin: Mouton de Gruyter.
Raposo, Eduardo Buzaglo Paiva, Maria Fernanda Bacelar do Nascimento, Maria Antónia Coelho da Mota, Luísa Segura & Amália Mendes. 2013. Gramática do Português. Lisboa: Fundação Calouste Gulbenkian.
Robinson, Laura C. 2008. Dupaningan Agta: Grammar, Vocabulary, and Texts. Ph.D. dissertation. University of Hawai’i.
Rodman, Emma. 2020. A Timely Intervention: Tracking the Changing Meanings of Political Concepts with Word Vectors. Political Analysis 28:1.87–111.
Rodda, Martina A., Marco S. G. Senaldi & Alessandro Lenci. 2017. Panta Rei: Tracking Semantic Change with Distributional Semantics in Ancient Greek. IJCoL. Italian Journal of Computational Linguistics 3:3–1.11–24.
Rodrigues, João, António Branco, Steven Neale & João Silva. 2016. LX-DSemVectors: Distributional Semantics Models for Portuguese. In Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science, vol 9727. ed. by João Silva, Ricardo Ribeiro, Paulo Quaresma, André Adami & António Branco, 9–27. Springer, Cham.
Rodrigues, Ruan Chaves, Jéssica Rodrigues, Pedro Vitor Quinta de Castro, Nádia Felix Felipe da Silva & Anderson Soares. 2020. Portuguese Language Models and Word Embeddings: Evaluating on Semantic Similarity Tasks. In Computational Processing of the Portuguese Language. Proceedings of the 14th International Conference, PROPOR 2020, Évora, Portugal, March 2–4, 2020 ed. by Paulo Quaresma, Renata Vieira, Sandra Aluísio, Helena Moniz, Fernando Batista & Teresa Gonçalves, 1–23. Cham: Springer.
Rodríguez Somolinos, Amalia. 1996. Pourtant pour autant.: Une évolution divergente. La lingüística francesa: Gramática, historia, epistemología 165–174. Grupo Andaluz de Pragmática.
Rong, Xin. 2014. word2vec Parameter Learning Explained. arXiv preprint
Sagi, Eyal, Stefan Kaufmann & Brady Clark. 2012. Tracing Semantic Change with Latent Semantic Analysis. Current Methods in Historical Semantics ed. by Kathryn Allan and Justyna A. Robinson, 161–183. Berlin: Mouton de Gruyter.
Sahlgren, Magnus & Alessandro Lenci. 2016. The Effects of Data Size and Frequency Range on Distributional Semantic Models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 975–980.
Ali, Manuel Said. 1971. Gramática Histórica da Língua Portuguesa. São Paulo: Melhoramentos.
Schlechtweg, Dominik, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky & Nina Tahmasebi. 2020. SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, 1–23.
Silva, Tatiana Mazza da. 2010. Gramaticalização de juntivos adversativos na história do Português. MA thesis. São José do Rio Preto.
Silveira Bueno, Francisco da. 1963. Grande dicionário etimológico-prosódico da língua portuguesa. São Paulo: Edição Saraiva.
Tahmasebi, Nina, Lars Borin & Adam Jatowt. 2018. Survey of Computational Approaches to Lexical Semantic Change. arXiv:1811.06278v2 [cs.CL]
Tang, Xuri. 2018. A State-of-the-Art of Semantic Change Computation. Natural Language Engineering 24:5.649–676.
Traugott, Elizabeth C. & Richard B. Dasher. 2001. Regularity in Semantic Change. Cambridge: Cambridge University Press.
Tsakalidis, Adam, Marya Bazzi, Mihai Cucuringu, Pierpaolo Basile & Barbara McGillivray. 2019. Mining the UK Web Archive for Semantic Change Detection. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), 1212–1221.
von Fintel, Kai. 1995. “The Formal Semantics of Grammaticalization.” Proceedings of the North East Linguistics Society 25 – Volume Two: Papers from the Workshops on Language Acquisition & Language Change, Article 14.
Zampieri, Marcos, Shervin Malmasi & Mark Dras. 2016. Modeling Language Change in Historical Corpora: The Case of Portuguese. In Proceedings of Language Resources and Evaluation (LREC), 4098–4104.
Cited by (1)
Cited by one other publication
JARRETT, DYLAN & PATRÍCIA AMARAL
2023.
Usage‐Based Approaches to Semantic Change. In
The Handbook of Usage‐Based Linguistics,
► pp. 435 ff.
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.