The increasing availability of knowledge bases (KBs), generated by academia and industry, attracts the attentions of researchers in several natural language processing (NLP) tasks with the aim of advancing state-of-the-art performance by making use of the vast amount of background knowledge available in on the web. However, most of the information that is found in KBs, like knowledge graphs, ontologies or terminological resources, is represented, in most of the cases, in one language only (e.g. English, German or Italian). Consequently, NLP applications that use these KBs are therefore limited to the language in which the information is stored. To make the information accessible beyond language borders, these KBs have to be translated into different languages. Since a manual enhancement of KBs is a very time-consuming and expensive process, machine translation (MT) can be applied for this purpose. Nevertheless, this translation task with MT is rather challenging, due to the sophisticated information of a certain domain knowledge, documented in knowledge graphs, the specific vocabulary in terminological dictionaries and the particular sentence structure of ontology labels.
In addition to the multilingual enhancement of monolingual KBs, a growing attention has also been paid to the integration of existing multilingual terminological knowledge into MT systems or computer-assisted translation (CAT) tools. An important open issue for this task is how to support translators with relevant information when dealing with specialised texts from different domains (IT, medical, law, etc.). Commercial or open source MT systems trained on generic data are the most common solutions, but they often struggle with the translation of the specific vocabulary found in this task. To reduce the post-editing effort involved in the translation process, a valuable alternative is to enhance the systems with existing multilingual knowledge, e.g. IATE or in-house terminological resources, which are agreed and curated resources that professionals use in their expert-to-expert communication. In this sense, the provision of multilingual KBs, which could be better integrated into the MT systems is a crucial step towards increasing the translation quality, since terminological expressions are among the most common sources of translation errors.
- Mihael Arcan, Insight Centre for Data Analytics, National University of Ireland Galway, Ireland
- Marco Turchi, Fondazione Bruno Kessler (FBK), Italy
- Jinhua Du, Investment AI, AIG, UK
- Dimitar Shterionov, ADAPT Centre, Dublin City University, Ireland
- Daniel Torregrosa, Insight Centre for Data Analytics, National University of Ireland Galway, Ireland
The workshop is co-located with MT-Summit in Dublin.