Europe faces a growing economic and societal challenge due to its vast diversity of languages, and machine translation technology holds promise as a means to address this challenge. The goals of EuroMatrixPlus are:
- To continue the rapid advance of machine translation technology, creating example systems for every official EU language, and providing other machine translation developers with our infrastructure for building statistical translation models.
- To continue and broaden the controlled systematic investigation of different approaches and techniques to accelerate the scientific evolution of novel methods, including both selection and cross-fertilization. The aim is to arrive at scientifically well understood novel combinations of methods that are proven superior to the state of the art.
- To focus on bringing machine translation to the users, in addition to focusing on scientific advances. Because our statistical models are derived from example translations, we believe that there is potential for a synergistic relationship in which users suggest improvements to the system by post-editing its output, and the system improves itself by learning from user feedback.
- To contribute to the growth and competitiveness of the European MT research scene and infrastructure through its open international competitive shared tasks and living community supported surveys of resources, tools, systems and their respective capabilities.
In bringing MT to the users, EuroMatrixPlus focuses on two different types of users: (a) professional translators and translation agencies working for private corporations, administrations, and other organisations, and (b) lay users who create content on a volunteer basis by translating foreign materials into their own languages. The project will investigate how these users can benefit from state of the art machine translation, and conversely, how machine translation can benefit from user corrections.
EuroMatrixPlus will create an openly accessible sample application that enables users to automatically translate news stories and web pages from any European language into any other, and whose corrections will be exploited as data for improving translation technology.
Description of target users and groups
Professional translators and translation agencies working for private corporations, administrations, and other organisations.
Lay users who create content on a volunteer basis by translating foreign materials into their own languages.
Description of the way to implement the initiative
The EuroMatrixPlus consortium integrates the efforts from academic research and companies to advance machine translation performance and bring it to the end user. The complex problem of translation requires an interdisciplinary research strategy. Neither linguists nor computer scientists, translation experts or mathematicians will be able to solve the problem without cooperation across traditional boundaries between disciplines. The partners of this consortium are selected on the basis of their complementary strengths, combining core competencies in machine translation and machine learning with experience in practical deployment in the marketplace.
The EuroMatrixPlus workplan consists of the following 10 work packages:
WP1: Rich Tree-Based Statistical Translation
Translating between European languages poses challenges - such as morphology and reordering - that are not adequately reflected in traditional phrase-based translation models. We therefore explore statistical translation models that exploit richer linguistic representations.
WP2: Hybrid Machine Translation
Recent detailed comparisons of rule-based and statistical translation systems carried out by members of the consortium have revealed different strengths of the two approaches that currently dominate the commercial and academic research field of machine translation. In this work package, we explore ways to tightly integrate the two approaches in a hybrid machine translation system.
WP3: Advanced Learning Methods for Machine Translation
Because statistical machine translation models are built in a data-driven fashion, the more training data that is used, the better the performance will be. Adding hundreds of million words leads to increasingly good translation quality. However, for many Central and Eastern European languages, limited training data constrains the quality of statistical machine translation systems. We will explore methods of using alternative training data and exploit better the available parallel corpora.
WP4: Open Source Tools and Data
We are committed to the idea of open source software as an essential means to collaborate within the EuroMatrixPlus project and to engage the greater research and development community. The consortium members have made significant contribution to the open source toolset in machine translation as part of the EuroMatrix project.
WP5: "WikiTrans" Community-Based Translation Environments
The ultimate test for machine translation is its utility for end-users. MT technology could be useful if it allows users to more quickly create content in their language from text in a source language that they have limited or no understanding of. This is especially important for many European languages that are currently under-served, both in terms in available content and in terms of existing language technology. In this work package, we bring the "Wiki" idea of collaborative content development to translation.
WP6: Integrated Localisation Workflow
The localization industry has not widely used machine translation, but has utilized translation memories to successfully in reduce the translation workload, especially in repetitive tasks such the translation of content that only partially changes over time (product manuals, company websites).
In partnering with the Research Centre for Next Generation Localisation (CNGL), we will integrate EuroMatrixPlus resources with CNGL research on standards and interoperability in localisation workflows. We will combine the technological advances in machine translation which are developed by other work packages with the industrial workflow processes used by the localisation industry. The close collaboration with industrial partners outside of EuroMatrixPlus will widen the reach of the results of this project and directly benefit the localisation industry in Europe.
WP7: Evaluation Campaign
Much of the progress in machine translation in this decade has been driven by open evaluation campaigns, where developers of machine translation systems are tasked to translate a previously unseen test corpus with their system and have their translation performance evaluated against other participants. The competitive aspect of these campaigns has driven researchers to focus on the most important problems for translation performance. The collaborative aspect of the follow-up meetings where methods are discussed in detail have contributed to the quick adoption of best known methods and the validation of novel approaches.
Almost all members of the EuroMatrixPlus consortium have participated in and helped to organize evaluation campaigns, most notably the series of workshops organised alongside ACL, the premier conference in computational linguistics. We will continue our efforts to provide a forum dedicated to the translation of European languages.
WP8: Project Management and Dissemination
WP9: Integrating Slovak Language Resources
The main goal of this work package is to include the Slovak language resources into the project.
WP10: HPSG-based Statistical Translation
The focus of this work package is the development of a statistical model for translation between Bulgarian and English. This will be done on the basis of a parallel HPSG-based treebank.
Technology solutionTechnology choice: Open source software
Main results, benefits and impacts
We expect the project to deliver significant improvements in translation technology for European languages, both in terms of the achieved quality and in the ease of deployment and adaptation to specific needs. The progress so far clearly indicates that our results will be practically relevant for many users of translation technology in industry and large organizations; hence we have started a number of collaborative activities with potential users. These activities include consulting work for translation departments of big institutions such as the DGT of the European Commission, as well as the organization of events that bring developers and users of our technologies together, such as the Machine Translation Marathons and the series of Translingual Europe conferences. Already today, these activities have lead to a widespread dissemination and adoption of our results, which will be further extended during the remaining year of the project.
This field will be completed by the submitter when the lessons learnt have been identified and understood.Scope: International