Skip to main content

Latvian automated translation service

paving the road toward virtual assistant.

Published on: 20/04/2018 Last update: 24/04/2018 Document Archived

Abstract

Via the Hugo.lv portal, Latvian authorities released its public sector’s machine translation service which is freely available to all visitors. It provides automatic translation from Latvian to English and vice versa, as well as from Latvian to Russian. Users can translate texts on the fly, but also documents and websites.

Hugo.lv is customized for the Latvian language and provides much higher translation quality than generic online translation systems.

As final step of this ongoing project Latvian public authorities, in connection with a local private company leader in the domain of machine translation (MT) services, will develop a Virtual Assistant. He will guide and accompany users through Latvian e-Services by replying to their queries in their favorite language among Latvian, English and Russian.

EIF and Interoperability Matching

The MT service address EIF recommendations.

The most important ones, beside the Openness principle (see the “technological solution” section below, are the assessment of Efficiency and Effectiveness and the User centricity.

As per the nature of the solution, Multilingualism is the core of the approach. Beside Latvian, Russian is widely spoken in the country and English is a good vehicle language for foreign users, mainly students, tourists or entrepreneurs.

Once deployed, the virtual assistant will be available 24/7, and only the most complex queries will be readdressed to physical civil servants.

 

Policy context

The multi-fund Operational Programme ''Growth and Employment'' is the European framework program that allows Latvia to invest funds in Innovation projects. It aims at achieving key national development priorities along with the "Europe 2020" objectives. In particular, the Objective 2.2.1. specific aid to „ Provide re-use of public data growth and efficient public administration and private sector interaction “

By combining support from the European Regional Development Fund (ERDF), the Cohesion Fund (CF), the European Social Fund (ESF) and the specific allocation for the Youth Employment Initiative (YEI); the Growth and employment (OP) provides a significant support to the economic growth and employment, with a particular focus on the competitiveness of Latvia's economy.

At the national level the project is developed according to the National Development Plan. It contains a strategic objective of Advanced Research and Innovation and Higher Education, for which one of the 9 measures adopted is the Development of language technologies.

 

Description of target users and groups

Main users targeted by Hugo are government institutions and citizens. Machine translation (MT) service are available for everyone, but integration of those services within larger IT systems is available for governmental institutions.

To answer the needs of translating normative acts, a dedicated version of the translation mechanism is reachable from the standard interface. It is built on a larger but specific knowledge base that releases translations fully targeted to the legislative and administrative sentences.

The website of some public agencies and services are already seamlessly integrating the MT into their interface. Through a “language selection” widget or thanks to the implementation of an application program interface (API), the machine translation of the displayed content is automatically executed.

The next intended developments of the solution (expected by end of 2018):

  • Tailor made translations for justice and culture related texts;
  • Text-to-speech and speech-to-text functionality for Latvian language;
  • Public administration access to easier translated data transmission to systems, as well as to computer-assisted translation (CAT) tool;
  • Possible use of plugin to Trados translation software.

All these improvements will be strengthen by the Virtual Assistant. It will bring innovative information service activities in natural language dialogue and ensure the use of user-friendly online communications environments. Release date for the assistant, after the necessary project test and acceptance phase, is planned to 2019.

 

Description of the way to implement the initiative

Through its integration into Latvia’s e-service platforms, Hugo.lv also aims to provide communities throughout the world with open access to Latvia’s e-services, regardless of their native language. This helps to spread e-democracy and fosters greater civic engagement.

The first phase of the project begun in October 2013 and completed in December of 2014, when the online web interface was published at https://hugo.lv/.

In its current phase, the project is now being actively considered for piloting in other countries of the Baltic region as well, adapted to other language pairs and domains.

  1. As part of the project, three different public service platforms / information systems will be developing virtual assistants as a new service delivery channel: Latvija.lv portal:
    Citizens can centrally obtain information on the Latvian state institutions and local governments, as well as receiving electronic services from institutions. It tackles the significant amount of information stored on the platform, the complexity of users search and the relatively long time needed to find necessary information. State Regional Development Agency ensures support for portal users through phone and emails service. However, a large part of problems stand on how to find information that is covered by the portal. Virtual assistant will respond to such need in a comfortable, natural language of dialogue and will help finding the necessary information to the user portal.

 

  1. a shared platform is developed to unify the national and local authority websites.
    The information will be sorted and made available for all public administrations involved in the website platform. With the assistance of the virtual assistant, visitors will have easier access to content, reducing the time consumption and allowing users to search for the necessary information.

 

  1. Ordering of books and other periodicals at the Latvian public libraries can be performed onsite or via the multiple library information systems in Latvia.
    Today, however, number of ordering and reserving books online are relatively low.  This is due to the lack of proactive communication with the user (as it is in the case of the on-site library visit). The introduction of a virtual assistant will allow proactive communication with users to lead to more frequent use of the system, as well as to promote users’ interest about electronic ordering feature in general.

Later this year more features will be added to Hugo.lv that will become a platform within the state ICT architecture. Not only new corpora (law, culture) will become available but also a state terminology to further improve the quality of translations will be implemented.

 

Governance

Machine translation tools are crucial enablers for multilingual communication. Because its language was underserved by tools developed by global providers, Latvia was among the first countries to build its own machine translation service. This output of the project has been produced in collaboration by the Culture information systems center (representing the public sector and coordinating the whole project), and the Riga-based language technology company Tilde (which developed the system).

The Culture Information systems centre works under the Ministry of the Culture, but the budget is allocated also from other relevant ministries. Each of them expresses their needs within the development of the whole project.

The Cabinet of Ministers executes review of the advancement state of the project and approves its financing.  Since the budget allocated by the Ministries is funded in majority by European Union funds, there are some criteria to respect in order to have them granted. This goes about the monitoring of the results and the insurance the system will be up and running for at least 5 years. In order to ensure this sustainability of the system, a budget of 990,000 EUR has been provided for this latter stage of MT phase of the project (as specified in the Cabinet Order 422).

 

Main results, benefits and impacts

Latvia’s machine translation service Hugo.lv contributes to enriching the global Information Society with content from Latvia while preserving linguistic diversity.

Once deployed, the virtual assistant technology will provide substantial benefits for public administration in improving access to services by developing new and modern public service delivery channels, including the reduction of the time necessary for information search and thereby ensuring quicker, easier and more frequent provision of services.

High-quality instant translation guarantees that communities can access information and knowledge in multiple languages, both inside and outside of Latvia. The machine translation service is integrated into Latvia’s e-services, so that all communities can receive services in a language they understand – a fundamental right in the digital age. 

The service upholds the linguistic diversity of Latvia – where the official language is Latvian but a sizeable minority speaks Russian at home – while at the same time bridges the language barrier between communities. This effectively fosters a dialogue between these distinct linguistic communities, and foster social cohesion. 

It also allows Latvians to continue to produce content in their native language and guarantee that the information is accessible in English as well.

With the service, communities can have no reserves about producing content in their native language and posting it online, as this information can also be translated and read by non-Latvian speakers as well, thus reaching people beyond the country’s borders. This empowers the local population to produce content in Latvian and enrich the overall linguistic diversity of the global Information Society.  

Hugo.lv also ensure that libraries, archives, museums, and other memory institutions can make their content understandable to a wider range of individuals. And so can the Latvian cultural information be shared more largely. This will also guarantee that research work can include Latvian input in their studies.

Lessons learnt

The first main challenge faced during the development of Hugo.lv was the lack of data resources to build the machine translation engines at the required level of quality. The small size of the Latvian language logically leads to a smaller set of data than might be found for larger languages elsewhere. This is the central reason why the machine translation system developed by large global providers favour larger languages and provide such poor quality for smaller languages.

This challenge was overcome by collecting additional data from a wide variety of sources. To this end, the developers used a number of novel methods. For data collection, new techniques were applied to process multilingual corpora. The developers also used tools to extract data from multilingual news sites (such as Wikipedia) and other comparable text collections. In this way, the developers were able to build up the largest parallel data corpus for Latvian available in the world. As a result, the translation quality for Hugo.lv largely outperforms generic online translation services.

The systems can be further improved by uploading new collections of data from various public sector organizations – specifically, parallel corpora or human-translated documents. The solution includes facilities that automatically extract and align sentences from user uploaded parallel documents in the most popular file formats.

The second main challenge faced by the project was the complexity of the source and target languages. Both Russian and Latvian have relatively free word order. Linguistic complexity is another reason why many generic translation services provide such poor quality for Latvian, and are thus insufficient for the needs of global communication.

To overcome this challenge, the developers of the project applied various sophisticated linguistic components that provided linguistic knowledge for the machine translation service.

These included tools such as sentence breakers, morphological analysers and synthesizers and part-of-speech.

These sophisticated linguistic components allow the machine translation service to guarantee high quality translation for Latvian and thus to perform highly accurate translations. Quality and accurate is the cornerstone of language technology – by overcoming these challenges, Latvia’s machine translation service has successfully reached its goals.

Categorisation

Type of document
General case study

Attachment