Language technologies

In recent years, the European Commission has funded in research projects in language technologies such as machine translation, speech recognition and data analytics and several promising initiatives are growing out of this investment. However, it needs to be pointed out that the technologies currently available do not support all European languages and more work and investment is required for challenging languages and appropriate resources. Big data, Cloud and Supercomputing have the potential to raise the quality of automated translation, so that language is no longer be a barrier in the market, in social and public services across the EU. The European Commission hopes that “By using digital solutions we can bridge language barriers if we consider our diversity as an opportunity rather than an obstacle.” Below are the most significant ongoing or completed language technology projects.

ELG (European Language Grid)

Multilingualism is at the heart of the European idea and one of its greatest assets of cultural diversity. The principle that all 24 official Member State languages have the same status is perpetuated in the EU Charter as well as in the Treaty on the EU. As also emphasised by the STOA Report and Euorpean Parliament Resolution, there is a big need for a shared platform that bundles repositories and applications to benefit European society, industry and politics. the European Language Grid (ELG) project is developing a platform meant to address this need and the fragmentation of the European language technology (LT) landscape by providing access to LT services and data sets. It is a scalable cloud platform which will ultimately provide access to hundreds of commercial and non-commercial LTs for all European languages in an easy-to-integrate way, including running services, tools, data sets and resources. It will enable the European LT community to deposit and upload their technologies and data, to deploy them through the grid, and to connect them with other resources. The ELG will boost the Multilingual Digital Single Market towards a European LT sector. In 2020, ELG has begun to close at least some open gaps in terms of missing data sets or technologies through its two open calls with 15-20 pilot projects.

META-NET (Multilingual Europe Technology Alliance)

This Network of Excellence, consisting of 60 research centres from 34 countries, was dedicated to building the technological foundations of a multilingual European information society. The benefits offered by Language Technology differ from language to language, and so do the actions that need to be taken within META-NET, depending on the factors such as the complexity of the respective language, the size of its community, and the existence of active research centres in this area. META-NET’s main objectives have been:

assessment: to collect, organize and disseminate information that permits to have an updated insight into the current status and the potential of language related activities, for each of the national and/or language communities represented in the project
collection: to assemble and prepare language resources for distribution. This includes collecting languages resources; documenting them; upgrading them to agreed standards and guidelines; linking and cross-lingual aligning them where appropriate
distribution: to distribute the assembled language resources through exchange facilities that can be used by language researchers, developers and professionals.
dissemination: to mobilise national and regional actors, public bodies and funding agencies by raising awareness with respect to the activities and results of the project, in particular, and of the whole area of language resources and technology, in general.

Starting with META-NET in 2010, a substantial number of initiatives and projects have attempted to foster research, innovation and development towards a truly multilingual Europe, enabled and supported by LT “made in Europe”.

CLARIN (Common Language Resources and Technology Infrastructure)

A research infrastructure initiated from the vision that all digital language resources and tools from all over Europe and beyond be accessible through a single sign-on online environment for the support of researchers in the humanities and social sciences. Currently CLARIN provides easy and sustainable access to digital language data (in written, spoken, or multimodal form) for scholars, and offers advanced tools to discover, explore, exploit, annotate, analyse or combine such data sets, wherever they are located. This is enabled through a networked federation of centres: language data repositories, service centres and knowledge centres, with single sign-on access for all members of the academic community in all participating countries. Tools and data from different centres are interoperable, so that data collections can be combined and tools from different sources can be chained to perform complex operations to support researchers in their work.

The CLARIN infrastructure is fully operational in many countries, and a large number of participating centres are offering access services to data, tools and expertise. At the same time, CLARIN continues to be constructed in some countries that joined more recently, and CLARIN’s datasets and services are constantly updated and improved. For a bibliographic overview of CLARIN-related publications, please consult the CLARIN Zotero Library.

CRACKING THE LANGUAGE BARRIER: Federation of European projects and organisations

This Federation assembles all European research and innovation projects as well as all related community organisations working on or with cross-lingual or multi-lingual technologies, in neighbouring areas or on closely related topics. In this umbrella initiative, partners collaborate on a joint objective to overcome any kind of language and communication barrier with the help of sophisticated language technologies. Among the areas of collaboration are shared scientific tasks and evaluation campaigns, strategy papers, data management, resource and technology repositories. This Strategic Research and Innovation Agenda presents the vision of the Human Language Project. It also presents ideas, approaches and solutions in order to make the Digital Single Market, a flagship initiative of the European Union, multilingual. The final document, Language Technologies for Multilingual Europe: Towards a Human Language Project, was unveiled in December 2017.3.