We had a demo at NIPS 2013, titled: “Cross-Lingual Technologies: Text to Logic Mapping, Search and Classification over 100 Languages”, where we demonstrated two lines of our work on XLike: mapping text to formal logic and finding text representation that are useful for cross-lingual search and classification. The demo was well received and inspired many interesting discussions. We will now briefly describe the main functionalities of the cross-lingual search/classification demo as the text to logic part has already been described in a previous post.
The demo presents a core functionality of computing similarities between pairs of documents from the 100 top Wikipedia languages (this roughly corresponds to all languages with more than 10,000 Wikipedia articles). The similarity kernels are computed using the cross-lingual comparable corpus extracted from Wikipedia, based on the cross-lingual links between articles. Since most language pairs have very few articles in common, we first computed all the pair-wise common representations with the English language (hub language) using the singular value decompositions of the cross-covariance matrices (using the vector space model representation of documents), and then use the English vector space to compare documents written in other languages. The similarity score can also be interpreted by examining the keywords that contributed the most.
Based on the similarity computations, the demo also features cross-lingual document categorization in the Open Directory Project taxonomy (also known as Dmoz). The cross-lingual categorization for 100 languages is based on training pair-wise latent spaces between English and all the other languages and using centroid classifiers in the latent spaces. This means that we operate with 99 models which correspond to 99 different latent bases in the English space.
 Jan Rupnik, Andrej Muhic, Blaz Fortuna, Janez Starc, Marko Grobelnik, Michael J Witbrock: Cross-Lingual Technologies: Text to Logic Mapping, Search and Classification over 100 Languages, NIPS 2013 Demonstrations