Cross-lingual Semantic Annotation

Jan 12 2014 Published by blaz under Project News, XLike Technology

The cross-lingual semantic annotation links the linguistic resources in one language to resources in the knowledge bases in any other language or to language independent representations. This semantic representation is later used in XLike for document mining purposes such as enabling cross-lingual services for publishers, media monitoring or developing new business intelligence applications.

The goal is to map word phrases in different languages into the same semantic interlingua, which consists of resources specified in knowledge bases such as Wikipedia and Linked Open Data (LOD) sources. Cross-lingual semantic annotation is performed in two stages: (1) first, candidate concepts in the knowledge base are linked to the linguistic resources based on a newly developed cross-lingual linked data lexica, called xLiD-Lexica, (2) next the candidate concepts get disambiguated based on the personalized PageRank algorithm by utilizing the structure of information contained in the knowledge base.

The xLiD-Lexica is stored in RDF format and contains about 300 million triples of cross-lingual groundings. It is extracted from Wikipedia dumps of July 2013 in English, German, Spanish, Catalan, Slovenian and Chinese, and based on the canonicalized datasets of DBpedia 3.8. More details can be found in [2].

The xLiD-Lexica SPARQL Endpoint and cross-lingual semantic annotation services are described as follows:

xLiD-Lexica: The cross-lingual groundings in xLiD-Lexica are translated into RDF data and are accessible through a SPARQL endpoint [1], based on OpenLink Virtuoso as the back-end database engine.
Semantic Annotation: The cross-lingual semantic annotation service is based on the xLiD-Lexica for entity mention recognition and the Java Universal Network/Graph Framework for graph-based disambiguation. An example of the service for annotating the XLike website using DBpedia in German is accessible under the URL [3].

[1] http://km.aifb.kit.edu/services/xlike-lexicon/
[2] http://people.aifb.kit.edu/lzh/xlike/xLiD-Lexica.pdf
[3] http://km.aifb.kit.edu/services/text-annotation/?source=&kb=dbpedia&lang2=de

FP7

XLike is funded by the European Community's Seventh Framework Programme FP7/2007-2013
Recent News
Links
- Language Processing Pipeline Open source prototypes from XLike
- GitHub code repository Open source prototypes from XLike
- Newsfeed Clean stream of semantically enriched news articles
- Multilingual Language Processing Wweb services for multilingual language processing
- Cross-lingual Document Linking Demo of cross-lingual similarity search
- News Data Visualization Interactive interface to Newsfeed data enriched with XLike technologies
- Event Registry
- QMiner Analytic platform for real-time large-scale streams containing structured and unstructured data
Project related videos
- xLiTe: Cross-Lingual Technologies
- Kickoff Meeting 2012, Bled

Cross-lingual Semantic Annotation

FP7

Recent News

Links

Project related videos