{"id":340,"date":"2013-01-21T13:24:10","date_gmt":"2013-01-21T12:24:10","guid":{"rendered":"http:\/\/xlike.ijs.si\/?p=340"},"modified":"2014-01-12T14:12:51","modified_gmt":"2014-01-12T13:12:51","slug":"cross-lingual-document-linking","status":"publish","type":"post","link":"http:\/\/xlike.ijs.si\/cross-lingual-document-linking\/","title":{"rendered":"Cross-lingual Document Linking"},"content":{"rendered":"

Measuring similarity between documents written in different languages is useful for several tasks, for example when building a cross-lingual content based recommendation system. Another example is tracking how news spreads which may involve crossing different languages.<\/p>\n

Having a cross-lingual similarity function and a common representation which is language independent \u00a0enables us to transform cross-lingual text mining problems (CL-classification, CL-information retrieval, CL-clustering) to standard machine learning techniques.<\/p>\n

Below we illustrate how to construct the language independent document representations as well as the cross-lingual similarity function, based on a multilingual document collection (training data).<\/p>\n

\"cca\"<\/a><\/p>\n

The current technology is based on LSI (latent semantic index) and CCA (canonical correlation analysis) approach described in:<\/p>\n