A new model based on knowledge and collocational
association
Shiva Taslimipoor | Research Group in Computational Linguistics, University of
Wolverhampton
Gloria Corpas Pastor | Research Group in Computational Linguistics, University of
Wolverhampton | University of Malaga
Omid Rohanian | Research Group in Computational Linguistics, University of
Wolverhampton
Semantic discrimination among concepts is a daily exercise for
humans when using natural languages. For example, given the words,
airplane and car, the word
flying can easily be thought and used as an attribute
to differentiate them. In this study, we propose a novel automatic approach
to detect whether an attribute word represents the difference between two
given words. We exploit a combination of knowledge-based and co-occurrence
features (collocations) to capture the semantic difference between two words
in relation to an attribute. The features are scores that are defined for
each pair of words and an attribute, based on association measures, n-gram
counts, word similarity, and Concept-Net relations. Based on these features
we designed a system that run several experiments on a SemEval-2018 dataset.
The experimental results indicate that the proposed model performs better,
or at least comparable with, other systems evaluated on the same data for
this task.
Attia, M., Samih, Y., Faruqui, M., & Maier, W. (2018). GHH at SemEval-2018 task 10: Discovering discriminative attributes in distributional semantics. In Proceedings of the 12th international workshop on semantic evaluation (pp. 947–952). New Orleans, LA: Association for Computational Linguistics. Retrieved from [URL].
Blevins, J. P. (2016). Word and paradigm morphology. Oxford: Oxford University Press.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606.
Church, K., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29.
Davis, E. (1990). Representations of commonsense knowledge. San Francisco, CA: Morgan Kaufmann Publishers Inc.
Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26, 297–302.
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
Evert, S. (2005). The statistics of word co-occurrences: word pairs and collocations. (PhD Thesis, University of Stuttgart, Stuttgart). Retrieved from [URL].
Fagarasan, L., Vecchi, E. M., & Clark, S. (2015). From distributional semantics to feature norms: grounding semantic models in human perceptual data. In Proceedings of the 11th international conference on computational semantics (pp. 52–57).
Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., & Smith, N. A. (2014). Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166.
Firth, J. (1957 [1968]). A synopsis of linguistic theory, 1930–1955. In F. R. Palmer (Ed.), Selected Papers of J. R. Firth, (1952–59) (pp. 168–205). London: Longmans.
Firth, J. (1968). Linguistic analysis as a study of meaning. In F. R. Palmer (Ed.), Selected Papers of J. R. Firth, (1952–59) (pp. 12–26). London: Longmans.
Halliday, M. (1966). Lexis as a linguistic level. In C. E. Bazell, J. C. Catford, M. A. K. Halliday, & R. H. Robins (Eds.), In memory of John Firth (pp. 148–162). London: Longman.
Hausmann, F. (2007). Die Kollokationen im Rahmen der Phraseologie – Systematische und historische Darstellung. Zeitschrift für Anglistik und Amerikanistik, 55, 217–234.
Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlỳ, P., & Suchomel, V. (2013). The tenten corpus family. In 7th international corpus linguistics conferencecl (p. 125–127).
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Suchomel, V. (2014). The Sketch engine: ten years on. Lexicography, 1(1), 7–36.
Kilgarriff, A., Rychlỳ, P., Smrz, P., & Tugwell, D. (2004). The Sketch Engine. In G. Williams, & S. Vessier (Eds.), Proceedings of the 11th EURALEX International Congress (pp. 105–116). Lorient: Université de Bretagne-Sud.
Krebs, A., Lenci, A., & Paperno, D. (2018). SemEval-2018 task 10: Capturing discriminative attributes. In Proceedings of the 12th international workshop on semantic evaluation (pp. 732–740). New Orleans, LA: Association for Computational Linguistics.
Krebs, A., & Paperno, D. (2016). Capturing discriminative attributes in a distributional space: Task proposal. In Proceedings of the 1st workshop on evaluating vector-space representations for NLP (pp. 51–54).
Krenn, B., & Evert, S. (2001). Can we do better than frequency? A case study on extracting PP-verb collocations. In Proceedings of the ACL Workshop on Collocations (pp. 39–46).
Lai, S., Leung, K. S., & Leung, Y. (2018). SUNNYNLP at SemEval-2018 task 10: A support-vector-machine-based method for detecting semantic difference using taxonomy and word embedding features. In Proceedings of the 12th international workshop on semantic evaluation (pp. 741–746). New Orleans, LA: Association for Computational Linguistics.
Lazaridou, A., Baroni, M., et al. (2016). The red one!: On learning to refer to things based on discriminative properties. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers) (pp. 213–218).
Lee, L. (1999). Measures of distributional similarity. In Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics (pp. 25–32). Stroudsburg, PA: Association for Computational Linguistics. Retrieved from
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1: Statistics (pp. 281–297). Berkeley, CA: University of California Press. Retrieved from [URL]
McRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavior research methods, 37(4), 547–559.
Mcskimin, J. R. (1977). The use of a semantic network in a deductive question- answering system. In Proc. IJCAI 5 (pp. 50–58).
Mihalcea, R., & Hassan, S. (2017). Similarity. In R. Mitkov (Ed.), The Oxford Handbook of Computational Linguistics. Oxford: Oxford University Press.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119).
Navigli, R., & Ponzetto, S. P. (2012). Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217–250.
Oakes, M. P. (1998). Statistics for Corpus Linguistics. Edinburgh University Press.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Santus, E., Biemann, C., & Chersoni, E. (2018). BomJi at SemEval-2018 task 10: Combining vector-, pattern- and graph-based information to identify discriminative attributes. In Proceedings of the 12th international workshop on semantic evaluation (pp. 990–994). New Orleans, LA: Association for Computational Linguistics.
Shiue, Y.-T., Huang, H.-H., & Chen, H.-H. (2018). NTU NLP lab system at SemEval-2018 task 10: Verifying semantic differences by integrating dis- tributional information and expert knowledge. In Proceedings of the 12th international workshop on semantic evaluation (pp. 1027–1033). New Orleans, LA: Association for Computational Linguistics.
Smadja, F. A., & McKeown, K. R. (1990). Automatically extracting and representing collocations for language generation. In Proceedings of the 28th annual meeting of the association for computational linguistics (pp. 252–259). Pittsburgh, PA: Association for Computational Linguistics. Retrieved from [URL]
Smadja, F. A., & McKeown, K. R. (1990). Automatically extracting and representing collocations for language generation. In Proceedings of the 28th annual meeting of the association for computational linguistics (pp. 252–259). Pittsburgh, PA: Association for Computational Linguistics. Retrieved from [URL]
Sowa, J. F. (Ed.) 1991. Principles of Semantic Networks. Explorations in the Representation of Knowledge. San Mateo, California: Morgan Kaufmann.
Speer, R., & Havasi, C. (2013). Conceptnet 5: A large semantic network for relational knowledge. In The People’s Web meets NLP (pp. 161–176). Springer.
Speer, R., & Lowry-Duda, J. (2017). Conceptnet at semeval-2017 task 2: Extending word embeddings with multilingual relational knowledge. In Proceedings of the 11th international workshop on semantic evaluation (semeval- 2017) (pp. 85–89).
Speer, R., & Lowry-Duda, J. (2018). Luminoso at SemEval-2018 task 10: Distinguishing attributes using text corpora and relational knowledge. In Proceedings of the 12th international workshop on semantic evaluation (pp. 985–989). New Orleans, LA: Association for Computational Linguistics.
Sussna, M. (1993). Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the second international conference on information and knowledge management (pp. 67–74). New York, NY: ACM.
Taslimipoor, S., Rohanian, O., Ha, L. A., Corpas Pastor, G., & Mitkov, R. (2018). Wolves at SemEval-2018 task 10: Semantic discrimination based on knowledge and association. In Proceedings of the 12th international workshop on semantic evaluation (pp. 972–976). New Orleans, LA: Association for Computational Linguistics.
Trenkmann, M. (2016). PhraseFinder – Search millions of books for language use. [URL] (Accessed: 2018–01–30).
Turney, P. D., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37(1), 141–188. Retrieved from [URL].
Vinayan, V., Anand Kumar, M., & Soman, K. P. (2019). Capturing discriminative attributes using convolution neural network over conceptnet number-batch embedding. In V. Sridhar, M. Padma, & K. R. Rao (Eds.), Emerging research in electronics, computer science and technology (pp. 793–802). Singapore: Springer Singapore.