Third International Workshop on Example-Based Machine Translation


Following two successful previous meetings at the MT Summits in 2001 and 2005, the 3rd International Workshop on Example-Based Machine Translation (EBMT) took place at the Centre for Next Generation Localization at Dublin City University, on November 12 and 13, 2009.
The main theme of the Workshop was "Going open-source to revive example-based machine translation", as a result of a reflection by the workshop chairs, Andy Way and Mikel Forcada, on the current success of statistical machine translation (SMT): Is it because SMT is the best way to do MT, or is it because SMT software is free and open-source, and therefore easily obtained and open to collaboration?

In the four years since the 2nd Workshop, held in Phuket, Thailand, as part of Machine Translation Summit X, EBMT research seemed to be languishing in the doldrums. The response to the call for papers issued in July 2009 might have been disappointing. On the contrary, 15 papers were received, of which 11 were accepted after being reviewed by a Programme Committee involving top EBMT researchers from around the world. A good part of the papers submitted addressed the theme and involved free/open-source MT (FOSMT) or related software, either describing new FOSMT software or announcing the release of FOSMT software involved in the research presented.

A complete one-day-and-a-half programme was assembled, starting with an invited talk by Prof. Sadao Kurohashi (Kyoto University) on "Fully Syntactic Example-Based Machine Translation", and including an open-discussion on the main theme of the Workshop: "Going open-source to revive EBMT".

Forty-seven people signed up to attend the workshop from ten different countries: Belgium, France, UK, Japan, the Netherlands, Poland, Spain, Switzerland, the United States, and, of course, the host country, Ireland. The session was opened by Dr. Stephen Flinter, Scientific Programme Manager with Science Foundation Ireland, and Prof. Josef van Genabith, Director of the Centre for Next Generation Localisation.

After Prof. Kurohashi's keynote address, two sessions took place, one on hybrid approaches to EBMT and the other on open-source EBMT packages and tools, with three papers each.

The scientific programme of the first day ended with the open discussion, which started with two short seeding talks by co-chairs Andy Way ("Open Research Questions in Example-Based Machine Translation") and Mikel L. Forcada ("Why free/open-source EBMT?").

Andy Way started by posing 11 open research questions for EBMT: just to name a few, questions about possible advantages of EBMT over SMT for online, real-time applications, or of tree-to-tree EBMT over SMT, questions about redundancy in the example base, about the lack of EBMT papers devoted to EBMT recombination, etc. Mikel L. Forcada gave a quick summary of free/open-source licences, and explained the advantages of doing research in a free/open-source setting, in particular as a way to foster collaboration and guarantee reproducibility of research.

There then followed short addresses by the three panellists, Sadao Kurohashi, Yves Lepage (Université de Caen) and Ralf Brown (Carnegie-Mellon University).  Sadao Kurohashi, who described himself as "positively disposed towards OS", advocated that EBMT should move away from the simplicity of SMT and embrace the power of linguistically-motivated technologies such as parsing. Ralf Brown explored some of the reasons that prevented researchers from going open-source (inertia, reluctance to show ugly code, university policies), insisted on some of the advantages already mentioned, and encouraged people to "just do it". Yves Lepage advocated collaboration toward the creation of a set of tools that would clearly identify EBMT and give it visibility, but with the clear aim of building one baseline EBMT system. He then went on to list a number of desiderata: availability of subsententially-aligned corpora and of open tools to align, evaluation tools and metrics that measure what the EBMT community would like to be measured, and increased external visibility and closeness to translation professionals.

The ensuing discussion revealed a wide consensus on the benefits of freeing/open-sourcing not only EBMT tools and engines, but also corpora and associated EBMT sub-sentential 'memories'. Some of the problems involved (such as the difficulties of obtaining permission from universities) were put forward. Some participants supported Lepage's proposal of one strong open-source EBMT system which could be used as a reference for all EBMT practitioners, but even in the absence of such, all who spoke were in favour of setting up an EBMT Internet portal where researchers would meet and share software and corpora. Other issues were also raised, such as real-life post-editing using EBMT (likely to be more favourably received by translators/post-editors than SMT), the (in)adequacy of BLEU-like automatic metrics for EBMT, and focussing on problems where EBMT clearly wins out over SMT.

Two sessions were held on Friday, one on "Pure" EBMT, and the other one on Applications of EBMT, comprising five papers in total. The conference closed at 12.30 and the attendees either headed for home or chose to stay in Dublin for the weekend.

The complete proceedings of the conference, including slides for many of the talks, are available online:

As to the next moves following the workshop, in order to bring together all of the open-source EBMT initiatives, the organizers of the workshop will launch a web portal by the end of 2009 (the URL which will be widely announced, also through the workshop page) and urge everyone to turn their good intentions into real collaboration before the enthusiasm and the consensus die away. In addition to a toolkit featuring the code of the open-sourced systems, corpora, demos and links to documentation and papers will also be featured. In the meantime, a list of open-source MT software (not only EBMT) is being compiled at  There will also be an EBMT tutorial and coding session at the MT Marathon ( EBMT groups are urged to send their students so that collaboration starts as soon as possible.

If things start to happen and the field begins to pick up momentum again, perhaps we won't have to let another four years pass before the 4th EBMT Workshop. It could be held in two years and collocated with a major MT event for convenience (for instance, with EAMT 2011 or with MT Summit XIII).

The 3rd Example-Based Machine Translation Workshop was made possible by the generous sponsorship of four institutions: the EAMT (, through a generous award, Dublin City University through their conference support programme, the Centre for Next Generation Localisation ( as the host institution, and Science Foundation Ireland (, which is supporting Mikel L. Forcada during his sabbatical stay at DCU.