Cross-lingual Word Feature Extraction and Supervised Learning Training Methods

Cross-lingual Word Feature Extraction and Supervised Learning Training Methods
Wang Junwei

UNSW Sydney, Australia 2032

Abstract：This paper explores the development and application of cross-lingual word embeddings in natural language processing (NLP). It reviews various methods for extracting cross-lingual word features, including supervised, semi-supervised, and unsupervised learning approaches. The paper discusses key techniques such as matrix factorization, neural network-based models, and pseudo-bilingual document alignment, highlighting challenges like data sparsity, word sense disambiguation, and the need for robust models to handle large text datasets effectively.

Key words: NLP; AI; machine learning

Reference

[1] Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J., 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.

[2] Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y. and Potts, C., 2013, October. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631-1642).

[3] Dyer, C., Ballesteros, M., Ling, W., Matthews, A. and Smith, N.A., 2015. Transition-based dependency parsing with stack long short-term memory. arXiv preprint arXiv:1505.08075.

[4] Bahdanau, D., 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

[5] Harris, Z.S., 1954. Distributional structure.

[6] Firth, J.R., 1957. A synopsis of linguistic theory 1930-1955. Studies in Linguistic Analysis, Special Volume/Blackwell.

[7] Turian, J., Ratinov, L. and Bengio, Y., 2010, July. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 384-394).

[8] Pereira, F., Tishby, N. and Lee, L., 1994. Distributional clustering of English words. arXiv preprint cmp-lg/9408011.

[9] Turney, P.D. and Pantel, P., 2010. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37, pp.141-188.

[10] Bishop, C.M. and Nasrabadi, N.M., 2006. Pattern recognition and machine learning (Vol. 4, No. 4, p. 738). New York: springer.

[11] Väyrynen, J.J. and Honkela, T., 2004. Word category maps based on emergent features created by ICA. Proceedings of the STeP, 19, pp.173-185.

[12] Bengio, Y., Ducharme, R., Vincent, P. and Jauvin, C., 2003. A neural probabilistic language model. Journal of machine learning research, 3(Feb), pp.1137-1155.

[13] Mikolov, T., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 3781.

[14] Pennington, J., Socher, R. and Manning, C.D., 2014, October. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).

[15] Yang, C. (2024). Learnable Formulated Weights (LFW) and Epoch-based Dynamic Window Size (EDWS) for improving Word2Vec performance. Journal of Machine Learning, vol. 45, no. 3, pp. 123-137.

[16] Li, W., Zhang, Y., and Liu, P. (2023). Word-Graph2vec: A method for learning word embeddings from large-scale corpora using random walk-based sampling. International Journal of Natural Language Processing, vol. 28, no. 5, pp. 212-229.

[17] Levy, O., Søgaard, A. and Goldberg, Y., 2016. A strong baseline for learning cross-lingual word embeddings from sentence alignments. arXiv preprint arXiv:1608.05426.

[18] Xiao, M. and Guo, Y., 2014, June. Distributed word representation learning for cross-lingual dependency parsing. In Proceedings of the eighteenth conference on computational natural language learning (pp. 119-129).

[19] Mikolov, T., Le, Q.V. and Sutskever, I., 2013. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168.

[20] Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M. and Matsumoto, Y., 2015. Ridge regression, hubness, and zero-shot learning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part I 15 (pp. 135-151). Springer International Publishing.

[21] Dinu, G., Lazaridou, A. and Baroni, M., 2014. Improving zero-shot learning by mitigating the hubness problem. arXiv preprint arXiv:1412.6568.

[22] Faruqui, M. and Dyer, C., 2014, April. Improving vector space word representations using multilingual correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (pp. 462-471).

[23] Xing, C., Wang, D., Liu, C. and Lin, Y., 2015. Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies (pp. 1006-1011).

[24] Smith, S.L., Turban, D.H., Hamblin, S. and Hammerla, N.Y., 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. arXiv preprint arXiv:1702.03859.

[25] Lazaridou, A., Dinu, G. and Baroni, M., 2015. Hubness and pollution: Delving into cross-space mapping for zero-shot learning. In Zong C, Strube M, editors. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers); 2015 Jul 26-31; Beijing, China. Stroudsburg (PA): Association for Computational Linguistics; 2015. p. 270-80. ACL (Association for Computational Linguistics).

[26] Gouws, S. and Søgaard, A., 2015. Simple task-specific bilingual word embeddings. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(pp. 1386-1390).

[27] Ammar, W., Mulcaire, G., Tsvetkov, Y., Lample, G., Dyer, C. and Smith, N.A., 2016. Massively multilingual word embeddings. arXiv preprint arXiv:1602.01925.

[28] Ammar, W., Mulcaire, G., Tsvetkov, Y., Lample, G., Dyer, C. and Smith, N.A., 2016. Massively multilingual word embeddings. arXiv preprint arXiv:1602.01925.

[29] Zou, W.Y., Socher, R., Cer, D. and Manning, C.D., 2013, October. Bilingual word embeddings for phrase-based machine translation. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1393-1398).

[30] Huang, K., Gardner, M., Papalexakis, E., Faloutsos, C., Sidiropoulos, N., Mitchell, T., Talukdar, P. and Fu, X., 2015, September. Translation invariant word embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1084-1088).

[31] Vyas, Y. and Carpuat, M., 2016, June. Sparse bilingual word representations for cross-lingual lexical entailment. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies (pp. 1187-1197).

[32] Guo, J., Che, W., Yarowsky, D., Wang, H. and Liu, T., 2015, July. Cross-lingual dependency parsing based on distributed representations. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 1234-1244).

[33] Hermann, K.M. and Blunsom, P., 2013. Multilingual distributed representations without word alignment. arXiv preprint arXiv:1312.6173.

[34] Soyer, H., Stenetorp, P. and Aizawa, A., 2014. Leveraging monolingual data for crosslingual compositional word representations. arXiv preprint arXiv:1412.6334.

[35] Vulić, I. and Moens, M.F., 2016. Bilingual distributed word representations from document-aligned comparable data. Journal of Artificial Intelligence Research, 55, pp.953-994.

[36] Vulic, I. and Moens, M.F., 2013, June. Cross-lingual semantic similarity of words as the similarity of their semantic word responses. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013) (pp. 106-116). ACL; East Stroudsburg, PA.