BCS MACHINE
TRANSLATION
SITE NAVIGATOR
Click for...
BCS MT
home
page
top
 
end
  overview  •  next item  •  previous item

British Computer Society's coat of arms 

 

British Computer Society
Natural Language Translation Specialist Group

Web-site: http://www.bcs.org.uk/siggroup/sg37.htm

Machine Translation Review
Issue No. 11, December 2000 - pages 7-10
ISSN 1358-8346

This page URL: http://www.bcs.org.uk/siggroup/nalatran/mtreview/mtr-11/mtr-11-7.htm
Size: 4 A4 pages when printed

 
Keeping Translation Technology under Control

by Dawn Murphy

 

Introduction

The following is a summary of a talk which I gave to the BCS Natural Language Translation Group on 8 December 1999. The talk was on the subject of controlled authoring and how better results from machine translation and translation memory tools can be achieved by restricting the language of the source text.

 
A new era for translation technology

Translation technology is becoming more widely accepted these days in the global marketplace. All companies who produce technical documentation are under increased pressure to reduce translation costs as they enter new markets, and to reduce time-to-market as global markets require virtually simultaneous release. While in the past they may have been sceptical about using translation technology, now many companies are sceptical of translation providers who do not use such tools.

Many companies nowadays expect their translation providers to use translation memory tools to cut translation costs. However, often these same clients are disappointed when the level of text re-use (100% matches) reported by the translation provider is much lower than they had expected. On analysis, it is often found that the lower hit rate is down to the way the source text is written. In effect, the source text is rewritten every time, so not surprisingly, it has to be retranslated every time.

As well as translation memory tools, fully automated machine translation is starting to come of age, with successful implementations being reported at various sites. The success of these implementations is due to various factors, including the suitability of the subject domain and text types and the customisability of the system. The quality of the source text is one of the most vital criteria for successful use of MT at present. The more restricted the source language, the less room for inaccuracy in translation. This is where controlled authoring comes in. Controlled authoring is not a new concept, having been discussed, and used, for a number of years, both as a means of improving English texts and as a means of gaining more control over the output of MT. It's gone through some rough times too, with many people questioning whether it is worth the investment.

 
The internet raises the stakes

What is happening now though is that we are reaching a point in the translation industry, where translation technology is becoming an essential part of the translation process rather than a curious experiment. The emergence of internet-based translation services aimed at corporate users is raising expectations too, with the promise of fast turnarounds for translation jobs. People are starting to expect everything at 'internet speed', i.e. translation at the touch of a button.

In addition, companies with a global presence are starting to realise the importance of localising their website. This is creating a huge amount of translation work which needs to be turned around virtually immediately, as website content may be updated on a daily basis. The volumes and turnaround times involved are often so high that traditional translation methods just cannot keep up with the demand, or the cost becomes unpalatable.

The application of MT and TM in both these contexts has obvious potential, ensuring rapid turnarounds and lower costs, though it should be said that MT is not always suitable for the more marketing-oriented content. Controlling the source text is suddenly more important than ever if we are to make the most of the translation technologies available to us and if quality translations are to be delivered on time. The investment is now balanced against a much higher return.

 
Controlling the source

Controlled authoring involves both restricting the language used by authors and using appropriate tools to ensure maximum re-use of existing text. Maximum re-use could be achieved through the use of author memory technology, which would prompt the author to phrase a single idea in the same words every time. With multilingual content management databases, this re-used text can be directly linked to its translation and thereby bypass both MT and TM for 100% matched text, bringing in TM for similar text (fuzzy matches) and MT for new text where appropriate.

Restricting the language which is used by authors essentially boils down to avoiding ambiguous constructions and ensuring that the correct terminology is used. This process can then be supported by look-up and checking tools which the author can use to ensure they do not use any disallowed constructions. Typical rules might be: 'Avoid using pronouns', or 'Keep to one instruction per sentence'. Criticisms often levelled at controlled language are that it produces unnatural, stilted text, but this is not necessarily the case. Most translators will be able to point you to a text they have been given to translate which was virtually incomprehensible in the source language. Carefully trained authors and carefully designed controlled languages can often improve the source text, making it easier to read and easier to follow instructions. In fact, the first controlled languages were designed for this very purpose, and it was only afterwards that the possibilities for translation were considered.

 
Terminology

Consider the following examples of how a term can be written in different ways, all taken from the same document:

front right hand wheel arch
front right-hand wheel arch
front rh wheel arch
front right wheel arch
right hand front wheel arch
right-hand front wheel arch
rh front wheel arch

Apart from confusing a translator who might think these terms referred to different concepts, this uncontrolled use of terminology can lead to sentences not being matched in translation memory tools. MT usually relies on user-customisable dictionaries to translate terms correctly so the only way to make sure that all variants of a term are translated are to add each one to the dictionary. But who can say that the next author will not write 'right front wheel arch', yet another variant? And how many separate entries will be needed to code this many variants for each term?

 
Controlled language

In the following example, the same idea is phrased in four different ways. This would ideally be avoided by using an author memory tool but if not, then controlled language can be applied to standardise the way an idea is phrased.

Center the steering wheel and lock in position.
Center the steering wheel. Lock in position.
Center the steering wheel. Lock it in position.
Center the steering wheel. Lock the steering wheel in position.

In the second line the sentence has been split into two so that there is one instruction per sentence. This is particularly important for translation memory tools, as re-use is much more likely with short simple sentences expressing one idea than with longer more complex sentences. In a software manual, the instruction 'Click on OK' might appear several times. It is much less likely that 'Enter your license key number and click on OK' will appear more than once in the same manual. In addition, the 'and' which links the two instructions is ambiguous in itself. It could mean 'and then' implying consecutive actions, or it could mean 'while' implying simultaneous actions. In the third line, the object has been made explicit so there is no ellipsis in the sentence. Where the object is not made explicit, the instruction could be incorrectly interpreted by a MT engine, which might see 'lock in' as a phrasal verb or even a noun followed by a preposition. In the fourth line, the object has been changed from a pronoun to the complete noun, eliminating referential ambiguity, and thereby reducing the potential for incorrect translation by an MT system.

Obtaining good results from translation memory relies on a restricted use of language. If the same idea is expressed the same way every time because the controlled language rules guide the author to expressing an idea in a certain way, the hit rate from TM should be higher.

 
Summary

Using controlled language creates unambiguous text, which presents MT with far fewer opportunities for mis-translation. Standardised terminology means fewer unrecognised words for an MT system, and more hits from TM. More consistent text makes MT easier to tune, by building up the dictionaries and tuning the grammar rules to suit the relatively small set of constructions used by the authors, and again, means more hits from TM because there are fewer ways to say the same thing. Finally, even before you reach the controlled language stage, using content management and author memory tools to maximise re-use of existing text will help to solve many of the cost, quality and time-to-market problems faced by companies today in the production of multilingual documentation.

(At the time of Dawn's talk, she was a language engineering consultant with Multilingual Technology Ltd (MTL). MTL has since been acquired by Berlitz GlobalNET to become the Berlitz GlobalNET Solutions Group, offering consulting and solutions for web globalisation. See the press release at http://www.berlitzglobalnet.com/english/press/the_press_room.asp . Dawn continues to work as a technical consultant with Berlitz GlobalNET, focusing on translation technology and multilingual content management.)

 

 

BCS MACHINE
TRANSLATION
SITE NAVIGATOR
Click for...
BCS MT
home
page
top
 
end
  overview  •  next item  •  previous item