BCS MACHINE
TRANSLATION
SITE NAVIGATOR
Click for...
BCS MT
home
page
top
 
end
  overview  •  next item  •  previous item

British Computer Society's logo  

 

The British Computer Society
Natural Language Translation Specialist Group

Web-site: http://www.bcs.org.uk/siggroup/sg37.htm

Machine Translation Review
Issue No. 12, December 2001 - pages 10-20
ISSN 1358-8346

This page URL: http://www.bcs.org.uk/siggroup/nalatran/mtreview/mtr-12/8.htm
Size: 10 A4 pages when printed

 
Towards a Fully-Automatic High-Quality Machine Translation System for Unrestricted Text

by Dr Miroslaw Gajer

Department of Automatic Control
Technical University in Cracow
al. Mickiewicza 30
30-059 Kraków
Poland.
E-mail:
mgajer@ia.agh.edu.pl

 

Introduction

Machine translation is a science that delivers knowledge about how to program the computers, so as they were able to translate between human languages, for example, Danish and Bulgarian. It may be amazing, but the field of machine translation is almost as old as the invention of the computer itself (Blekhman and Pevzner 2000). In 1949 Warren Weaver, who was a crystallographer, sent a memorandum to The Rockefeller Foundation (an American institution supporting scientific research), in which he demanded that research be started on the automation of translation between natural languages (Arnold et al 1994). Warren Weaver was inspired by cryptographic techniques, which were developed very strongly during the years of The Second World War, and he thought that there existed some fundamental similarities between these cryptographic techniques and the process of translation between human languages (Waibel 2000).

This author does not know if Warren Weaver had a good command of any foreign language, but he claims that the level of Weaver's general linguistic knowledge was rather low. Indeed, it soon appeared that the problem of machine translation is far more complicated and far more harder than Weaver had ever imagined.

Now, after more than fifty years of scientific research in the field of machine translation, a fully-automatic high-quality machine translation system operating on unrestricted text still remains (with some exceptions) an unattainable goal.

So, why is machine translation so difficult, far more difficult than speech recognition or optical character recognition? Is fully-automatic high-quality machine translation for unrestricted text ever possible?

 

Translation as a Highly Creative Process

To answer the first of the two above questions, let us consider the differences which we can discover when we compare some of the human languages.

First of all, when we study grammatical systems of any natural languages that are not closely related with each other, we easily can see that there exist many more differences than similarities between them (Zue and Glass 2000). For example, let us compare the systems of personal pronouns of Arabic and Hungarian languages.

Personal pronoun system of Hungarian:

Singular Plural
1. én 1. mi
2. te 2. ti
3. o 3. ok

Personal pronoun system of Arabic:

Singular Double Plural
1. ana 1. nahnu 1. nahnu
2. (m.) anta 2. antuma 2. (m.) antum
2. (f.) anti   2. (f.) antunna
3. (m.) huua 3. huma 3. (m.) hum
3. (f.) hija   3. (f.) hunna

It's clear that the personal pronouns system of Arabic is much more complicated than that for Hungarian. It is caused by the fact that the Hungarian language has no notion of grammatical gender for words. Also, grammatical number in Hungarian can be only singular or plural, whereas in Arabic it can be singular, plural, or double.

So, one can easily see that translating Hungarian personal pronouns into their Arabic equivalents is a hard task. For example, if we want to translate Hungarian pronoun õk (in English they) into Arabic we must additionally know how many persons are being referred to by this pronoun ok. If exactly two persons are being considered then we will use the Arabic word huma. But, if there are more than two persons we must additionally know whether they are men or women. If they are men we will use the Arabic word hum, otherwise hunna.

How do we know how many persons are involved, and whether they are men or women, when the Hungarian word õk says nothing about it? The answer is that we know this from the context of the utterance. A human translator can in most cases very easily extract such contextual information, but the full automation of this process remains pure science-fiction.

Quite big differences between human languages can also be noticed when we study their vocabularies. In fact, the vocabulary of each language is an independent and very compound system. If we want to translate, for example, from Chinese into Croatian it is hard work to find in Croatian the equivalents of Chinese words that preserve their original meanings. A human translator copes with an enormous number of lexical holes, that is, words which have no equivalent in the target language and which as such can be translated only by a (sometimes) long description based on their semantics.

This situation is illustrated in Figure 1 where each rectangle is a symbol of some physically existing object or some abstract entity. The rectangles are numerated from 1 to 6. Further, we have two different natural languages: language A and language B.

We can see that in language A, objects 1 and 2 are described only by one common lexical entity, whereas in language B there exist two different lexical entities, one each for object 1 and object 2.

Further, we can notice that the object 3 has no lexical entity in language A and is therefore a lexical hole, whereas in the language B it has its own lexical item.

Objects 4 and 5 in language A are grouped together in one lexical entity and object 6 is a separate lexical entity, while in language B it is otherwise. We notice that objects 5 and 6 form one lexical entity.

Figure 1. An illustration of the way in which different languages divide reality into lexical items.

A very good example of these possibly rather abstract semantic divisions comes from Swedish language. If we want to translate English word grandfather into Swedish, we must additionally know whether this grandfather is a father of a father or a father of a mother. In the first case we should use the Swedish word farfar in the other morfar, which is illustrated in Figure 2.

Figure 2. The English word grandfather versus Swedish farfar and morfar.

The most serious problem, which the computer has to cope with in machine translation is the ambiguity of any human language (Baker et al, undated). We can talk of syntactic ambiguity when there exist at least two alternative ways of syntactic analysis of a sentence and of semantic ambiguity when one sentence can be understood in at least two different ways, although the most common is lexical ambiguity. Lexical ambiguity is such a serious problem in the case of machine translation systems because it exists in every natural language and it is really ubiquitous. Indeed, if we open any bilingual dictionary, for example, The Great English-Polish Dictionary, it's very hard to find a word that would only have exactly one meaning. In fact, most of English words have at least two completely different Polish equivalents. So, the question is which one of them the computer should choose while translating, and where can computer derive the information to establish which is the correct one?

Let us suppose that we have a sentence built from ten different word, and let each of these words has exactly two different meanings. If the computer chooses the equivalents of these words at random this sentence could be translated in 1024 different ways. The probability that acting this way we obtain a correct translation of a whole document built from many such sentences is equal to zero in practice. Moreover, no efficient algorithm that allows for solving this problem is known, and lexical ambiguity can be found in abundance in any human language - below are listed some examples of Polish translation of lexically ambiguous words taken from several languages of the world.

Polish equivalents of the French word perle are: 1. perla, 2. paciorek, 3. kapsulka

Polish equivalents of the Spanish word fondo are: 1. dno, 2. glebia, 3. tlo

Polish equivalents of the Italian word stufa are: 1. piec, 2. cieplarnia

Polish equivalents of the German word

Absatz

are: 1. ustep, 2. obcas, 3. osad, 4. zloze, 5. osadzenie, 6. zbyt

Polish equivalents of the English word butt are: 1. beczka, 2. pien, 3. pniak, 4. grubszy koniec, 5. kolba karabinu, 6. plastuga, 7. nasyp za strzelnica, 8. posmiewisko, 9. uderzenie glowa

Polish equivalents of the Dutch word boodschap are: 1. poselstwo, 2. polecenie, 3. wiadomosc, 4. zakupy

Polish equivalents of the Swedish word tomten are: 1. parcela, 2. plac, 3. krasnoludek

Polish equivalents of the Norwegian word hytte are: 1. chata, 2. szalas, 3. buda, 4. huta, 5. kabina

Polish equivalents of the Danish word lųber are: 1. biegacz, 2. dywanik

Polish equivalents of the Finnish word kanta are: 1. podstawa, 2. obcas, 3. stanowisko, 4. baza

Polish equivalents of the Greek word s??p?? (skopos)[Greek font not available.] are: 1. zamiar, 2. melodia, 3. wartownik

Polish equivalents of the Arabic word ??? (wusal)[Arabic font not available.] are: 1. polaczenie, 2. lacze, 3. kontakt, 4. zwiazek, 5. zawias, 6. dodatek

Considering all the above-mentioned factors, translation between natural languages can be seen as a highly creative process. A human translator must have a lot of invention and must know how to deal with situations he had never met before. So, the question is whether it is possible to replace a human being by a computer?

A prominent physicist Roger Penrose in his famous books New Caesar's Mind and The Shadows of the Mind gave very strong arguments supporting his thesis that the brain operates in a non-algorithmic manner and, because of this fact, a human mind cannot be fully simulated by computer.

So, because we cannot replace a human by a computer does it also mean that a fully-automatic high-quality machine translation for unrestricted text is impossible?

Alan Melby (1999) states that machine translation is headed in the right direction as far as domain-specific approaches using controlled languages are concerned but that further work on fully-automatic high-quality machine translation of unrestricted text is a waste of time and money. To build such systems a real breakthrough in natural language processing (and maybe in the whole field of information processing) is required. Moreover, such breakthrough will not be based on any extension of currently known techniques: the electric bulb was not invented because research on the candle had been conducted.

 

Example-Based Machine Translation

The arguments given by Roger Penrose are very strong and it is no longer possible to ignore them. So, Alan Melby is probably right when he states that it will not be possible to replace a human translator by relying only on currently known techniques. But using these currently known techniques we can still try to come as close as possible to this unattainable goal, which is a fully-automatic high-quality machine translation for unrestricted text. Suppose that during this research we built a machine translation system, which gives a translation of 99% accuracy while operating on an unrestricted text (only 1% of this text need to be edited by a human translator). So can we really say, like Alan Melby, that we had wasted time and money on this research?

Up till now, many totally different approaches to machine translation have been developed, such as: syntactic transfer, interlingua-based machine translation, knowledge-based machine translation, systems based on statistics or neural nets, etc. (Ney et al 2000, Canals et al 2000, Loukachevitch and Dobrov 2000). Of these, example-based machine translation is becoming a serious alternative paradigm, but in most cases it is still an unproven technique in the early research phase (Carbonell et al, undated).

On the other hand, this is not entirely the case. One prominent example comes from Spain. The case of the magazine entitled Periódico de Catalunya is interesting because it illustrates probably the first fully operational machine translation system for the translation of unrestricted text that has ever been built, producing nearly 100% satisfactory results while translating from Spanish into Catalan. It is really amazing that this machine translation system is not based on any of the currently known computational linguistics theories. Moreover, it does not analyze the sentence in any way: it only replaces source words (or groups of words) by their target equivalents, just as a spelling-checker would do. The system has a huge dictionary that effectively replaces all linguistic analysis of the source text. The development of the system requires a lot of work: in fact a large team of trained linguists constantly updates the dictionary with new terms, verbs in their different forms and sequences of words of up to six elements. Up till now it has been probably the only practical implementation of a purely unsophisticated machine translation system basing only on a pattern-matching scheme (Rico, undated).

So can this Spanish-Catalan system be an example showing the way to solve the mysterious problem of building a fully-automatic high-quality machine translation system? The answer to this question is not so obvious as one may think. We cannot omit the fact that this Spanish-Catalan system benefits to the great measure from the similarities of the two languages involved in the machine translation process. In fact, the differences between Spanish and Catalan languages are rather minor and are in most cases phonological in nature and only rarely morphological or grammatical.

The results obtained during the development of the Spanish-Catalan machine translation system can be obviously applied to any system which translates between closely related languages. A similarly effective machine translation systems for unrestricted text can probably be built for pairs of languages such as Swedish and Norwegian, Norwegian and Danish, Swedish and Danish, Spanish and Portuguese, German and Dutch, or Finnish and Estonian. But it is doubtful whether a high-quality machine translation system for unrestricted text can be built in this way that would be able to translate between a pair of totally typologically different and unrelated languages like, for example, Chinese and French.

To imagine how difficult the translation between unrelated languages is, the following experiment was conducted (Majewicz 1989). A sample text written in Polish was taken, the elements (words or phrases) of which were numbered in the following way.

 

The original Polish text:

Analiza tych dwóch elementów zwyczaju miedzynarodowego posiada wielkie znaczenie z uwagi na wyrok Miedzynarodowego Trybunalu Sprawiedliwosci z 29 listopada 1950r. (w sporze miedzy Boliwia a Peru o prawo azylu), który stwierdza, ze panstwo, które powoluje sie na zwyczaj miedzynarodowy wedlug art. 38 b musi przeprowadzic dowód, iz powstal on w sposób wiazacy drugie panstwo.

 

The order of words in the Polish text:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47

The text was then translated into English, giving the following order of the English equivalents of the elements of the Polish source text.

 

The English translation of the text:

The analysis of these two elements of the international custom is of a great importance in view of the sentence of the International Court of Justice of the 29th of November 1950 (concerning the dispute between Bolivia and Peru about the right of asylum) which states that a state that is referring to the international custom quoting Article 38b must present the evidence that the custom emerged in a way confining the other state.

 

The order of words in the English translation:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47

We can see that in the English translation the word order is almost the same as in the Polish original text. Only two elements (32 and 33) are swapped. This suggests that maybe example-based machine translation technique can be applied successfully to English and Polish. The same text was also translated into Japanese, giving the following word order.

 

The Japanese translation of the text:

Kokusai kanshu-no kono futatsu-no yoso-no bunseki-wa daisanjuhachi-jo-bi-ni shitagatte kokusai kanshu-o in'yo suru kokka-wa kokusai kan-shu-ga ta-no kokka-o kosoku suru hoho-de sonzai suru to iu shoko-o teishutsu shinakereba naranai to iu koto-o kakunin suru (higoken-ni kansuru boribia peru kan-no funso-ni tsuite-no) senkyuhyakugojunen juichigatsu nijukunichi-no kokusai shiho saibansho-no hanketsu-kara mite hijo-na juyosei-motte iru.

 

The order of words in the Japanese translation:

6, 5, 2, 3, 4, 1, 36, 35, 37, 34, 33, 32, 30, 28, 43, 46, 47, 45, 44, 42, 40, 39, 38, 26, 24, 23, 22, 19, 21, 21, 18, 17, 16, 15, 12, 14, 13, 11, 10, 8, 9, 7

We can see that the word order in the Japanese translation is totally different from the Polish source text. This does not encourage us to consider using an example-based machine translation technique for such unrelated and typologically different languages. In order to establish whether example based-machine translation is possible between different Indo-European languages, this author conducted some experiments. This author took some samples of the texts belonging to different European languages and translated them manually into Polish using an example-based technique. The chosen translation examples were as short as possible. In fact they could not be any shorter because in this case the Polish translation would be incorrect.

Below are some translation samples, in which the translation-examples have been underlined. The average length of these translation-examples was also calculated (in number of words).

  1. Swedish into Polish example-base translation sample:

    Jag är på semester på Gotland. (2, 2, 2)
    Ja jestem na wakacjach na Gotlandii.

    I dag har jag varit och titat på den här stenen. (2, 6, 3)
    Dzisiaj bylem popatrzec na ten oto kamien.

    Den är mycket fin med många bilder men det är inga runor på den. (4, 3, 1, 4, 2)
    On jest bardzo fajny z wieloma obrazkami ale nie ma zadnych napisów runicznych na nim.

    Här har varit fint väder hela tiden. (1, 4, 2)
    Tutaj byla ladna pogoda przez caly tydzien.

    Jag har solat och badad varje dag. (5, 2)
    Opalalem i kapalem sie kazdego dnia.

    The average length of translation examples is 2.81 words.

  2. Norwegian into Polish example-base translation sample:

    Om kvelden liker vi best å være hjemme og ta det med ro. (2, 3, 3, 1, 4)
    Wieczorami wolimy byc w domu i miec spokój.

    Vi snakker sammen vi hører på radio eller vi ser på TV. (2, 1, 4, 1, 4)
    Jemy razem sluchamy radia albo ogladamy telewizje.

    Kanskje drikker far og mor kaffe ved 6-tiden. (1, 5, 2)
    Czasami ojciec i matka pija kawe okolo godziny szóstej.

    Vi får melk eller saft. (2, 1, 1, 1)
    Dostajemy mleko albo sok.

    The average length of translation examples is 2.24 words.

  3. Danish into Polish example-base translation sample:

    Slottet stammer fra det syttende århundrede. (2, 4)
    Zamek ten pochodzi z siódmego wieku.

    En berømt forfatter boede og skabte her. (6, 1)
    Znany pisarz mieszkal i tworzyl tutaj.

    Det er udgravninger fra vikingtiden. (3, 2)
    To sa wykopaliska z czasów Wikingów.

    The average length of translation examples is 3.00 words.

  4. Dutch into Polish example-based translation sample:

    Voor bewoners van Friesland is de moedertall niet Nederlands maar Fries. (4, 5, 2)
    Dla mieszkanców Fryzji nie jest jezykiem ojczystym niderlandzki ale fryzyjski.

    Dit is geen dialect het is een cultuurtaal. (4, 4)
    To nie jest dialect to jest jezyk kulturalny.

    Men kan aan de universiteiten Friese taal- en lettterkunde studeren en men kann daarin doctoraalexamen doen. (2, 3, 5, 1, 2, 3)
    Mozna na uniwersytetach studiowac fryzyjska filologie i mozna zdawac z tej dziedziny egzamin magisterski.

    Op de scholen in Friesland bij rechtszittingen in de kerk op vergaderingen van gemeenteraden en van de Provinciale Staten is heel vaak Fries de voertaal. (5, 2, 3, 9, 6)
    We fryzyjskich szkolach przy posiedzeniach sadowych w kosciele na zgromadzeniach rad okregowych i rad prowincji bardzo czesto fryzyjski jest jezykiem obrad.

    Aangezien alle Friezen ook Nederlands kennen is Friesland een tweetalige provincie. (1, 5, 5)
    Poniewaz jednak wszyscy Fryzowie znaja równiez jezyk niderlandzki Fryzja jest dwujezyczna prowincja.

    The average length of translation examples is 3.74 words.

  5. Spanish into Polish example-based translation sample:

    Es un asunto un poco delicado pero te lo voy a contar. (3, 3, 1, 5)
    Jest to zagadnienie troche delikatne ale opowiem ci je.

    Hace una semana llegaron los primos de Luis para pasar aquķ unos dķas. (3, 5, 5)
    Tydzien temu przyjechali kuzyni Luisa aby pedzic tu kilka dni.

    Tũ sabes que el estudio de Luis es muy pequeño y por eso les dije que se quedaran en mi casa. (3, 4, 3, 3, 2, 6)
    Wiesz ze gabinet Luisa jest bardzo maly i dlatego powiedzialem im aby zostali u mnie w domu.

    La verdad es que éramos buenos amigos. (4, 3)
    Prawda jest taka ze bylismy dobrymi przyjaciólmi.

    The average length of translation examples is 4.08 words.

  6. Italian into Polish example-based translation sample:

    É un fatto abbastanza strano per noi polacchi. (3, 2, 3)
    Jest to fakt dosyc dziwny dla nas Polaków.

    Da noi le differenze di cui parlo sono meno grandi. (2, 5, 3)
    U nas róznice o których mówie sa mniejsze.

    In Italia invece mentre viaggiavo dal Nord verso il Sud mi sembrava di attraversare non uno ma più Paesi. (3, 2, 5, 8)
    Natomiast we Wloszech podczas gdy podrózowalem z pólnocy na poludnie wydawalo mi sie ze przemierzam nie jeden ale wiecej krajów.

    Cambiava tutto il paesaggio il clima l'architettura gli abitanti il loro modo di vivere e persino la lingua. (2, 2, 2, 1, 2, 5, 2, 2)
    Zmienialo sie wszystko pejzaz klimat architektura mieszkancy ich sposób zycia a nawet jezyk.

    The average length of translation examples is 3.00 words.

  7. French into Polish example-based translation samples:

    Parfois le soir après une journée bien remplie j'ai l'impression de n'avoir rien fait. (3, 5, 6)
    Czasami wieczorem po calym dniu mocno wypelnionym praca mam wrazenie ze nic nie zrobilam.

    Et je sius fatiguée. (1, 3)
    I jestem zmeczona.

    Il y a une très grande distance entre le travail que j'ai fait et le résultat. (3, 4, 9)
    Jest bardzo duza róznica pomiedzy praca która wykonuje a jej rezultatami.

    Par example j'ai prepare à manger j'ai fait la vesselle j'ai tout range je me suis occupée de la petite et il me dit tu n'as pas repassé mes chemises. (2, 4, 3, 3, 7, 4, 6)
    Na przyklad przygotowalam posilek zmylam naczynia wszystko poukladalam zajelam sie dzieckiem a on mi mówi nie wyprasowalas mi koszul.

    The average length of translation examples is 4.20 words.

  8. German into Polish example-based translation samples:

    Mit über 900.000 Einwohner ist Köln heute die drittgrößte Stadt in der Bundesrepublik Deutschland nach Hamburg und München. (4, 6, 4, 4)
    Z ponad 900.000 tysiacami mieszkanców jest obecnie Kolonia trzecim co do wielkosci miastem w Republice Federalnej Niemiec po Hamburgu i Monachium.

    Die Stadt liegt 45 km von der Bundeshauptstadt Bon entfernt. (2, 1, 7)
    Miasto lezy w odleglosci 45 km od stolicy zwiazku Bon.

    Ihre Geschichte reicht bis in die Römerzeit zurück. (2, 1, 5)
    Jego historia siega az do czasów rzymskich.

    Früher war der Name der Stadt Colonia. (1, 5, 1)
    Wczesniej nazwa miasta brzmiala Colonia.

    The average length of translation examples is 3.31 words.

  9. Slovakian into Polish example-based translation samples:

    Jeden z najstarších známych písomných dokladov o škole na Slovensku pochádza z Nitry. (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
    Jeden z najstarszych znanych pisemnych dokumentów o szkole na Slowacji pochodzi z Nitry.

    Nitra mala pri biskupskej katedrále kapitulnú školu. (1, 1, 1, 1, 1, 2)
    Nitra miala przy biskupiej katedrze szkole zakonna.

    Neskôr na tejto škole ucili aj vzdelaní laickí ucitelia. (1, 3, 1, 1, 1, 2)
    Pózniej w szkole tej uczyli równiez wyksztalceni nauczyciele swieccy.

    The average length of translation examples is 1.16 words.

  10. Czech into Polish example-based translation samples:

    Ceskį republika patri mezi nejkrįsnejšķ zme na svete. (2, 1, 3, 1, 1)
    Czeska Republika nalezy najpiekniejszych krajów na swiecie.

    Znacnou cįst jejķho śzemķ tvorķ lesy. (1, 1, 1, 1,2)
    Znaczna czesc jej terytorium tworza lasy.

    Jsou zde také rozsįhlé nķžiny s loukami pastvinami a poli. (1,1,1,1,1,1,1,1,1,1)
    Sa tam takze obszerne niziny z lakami pastwiskami i polami.

    The average length of translation examples is 1.20 words.

  11. Serbo-Croatian into Polish example-based translation samples:

    U drugoj sobi stoji krevet moga sina stolic sa dve male stolice ormar za odelo i igracke. (3, 1, 1, 2, 1, 4, 1, 2, 1, 1)
    W drugim pokoju stoi lózko mojego syna stól z dwoma malymi krzeslami szafa na ubrania i zabawki.

    Tu se nalazi naš radio aparat. (1, 2, 1, 2)
    Tu znajduje sie nasz radioodbiornik.

    Na zidu vise slike a na prozoru stoje saksije sa cvecem. (2, 2, 1, 2, 2, 2)
    Na scianie wisza obrazy a na oknie stoja doniczki z kwiatami.

    The average length of translation examples is 1.70 words.

From the above translation examples we can see that example-based machine translation between Polish and other Indo-European languages is possible. We can also observe that the more closely related the languages, the shorter the translation examples are.

 

Conclusions

Based on the example of the Spanish-Catalan machine translation system, which is able to translate an unrestricted text of Periódico de Catalunya, and basing on the results of translation experiments conducted by this author, we can draw the conclusion that example-based machine translation is headed in the right direction. In fact, fully-automatic high-quality machine translation for unrestricted text is possible, and we can by no measure say that developing such systems is a waste of time and money, because the positive results of these experiments are obvious (Fukutomi 2000, Murphy 2000, Nyberg et al, Mitamura, Mitamura and Nyberg, undated). But there is one condition that must be fulfilled. The languages between which we want to translate must be related, even if this relationship is not as close as, for example, between Dutch and Polish. But it is very doubtful if the example-based machine translation technique can be applied to languages which are typologically different and wholly unrelated. In such case the problem of building a fully-automatic high-quality machine translation system for unrestricted text is still very far from its final solution, and maybe the further work on such systems using currently known techniques is a pure waste of time and money.

 

References

Arnold, D., Balkan, L., Meijer, S., Humphreys, R.L. and Sadler, L. (1994) Machine Translation: An Introductory Guide, NCC Blackwell, London

Baker, K.L., Franz, A.M., Jordan, P.W., Mitamura, T., Nyberg, E. (undated) Coping with Ambiguity in a Large-Scale Machine Translation System, Center for Machine Translation, Carnegie Mellon University, Pittsburgh, USA

Blekhman, M. and Pevzner, B (2000) 'First Steps of Language Engineering in the USSR: The 50s through 70s, in Machine Translation Review, Issue No. 11, December 2000, pp. 5-7

Canals, R., Esteve, A., Garrido, A., et al (2000) 'InterNOSTRUM: A Spanish-Catalan Machine Translation System', in Machine Translation Review, No. 11, December 2000, pp. 21-25

Carbonell, J., Mitamura, T., and Nyberg, E. (undated) The KANT Perspective: A Critique of Pure Transfer (and Pure Interlingua, Pure Statistics, ...), Center for Machine Translation, Carnegie Mellon University, Pittsburgh, USA

Fukutomi, O. (2000) 'Report on Commercial Machine Translation in a Manufacturing Domain', in Machine Translation Review, No. 11, December 2000, pp. 16-25

Loukachevitch, N.V. and Dobrov, B.V. (2000) 'Thesaurus-Based Structural Thematic Summary in Multilingual Information Systems', in Machine Translation Review, No. 11, December 2000, pp. 10-20

Majewicz, A. F. (1989) The languages of the world and their classifying, Warsaw, Poland

Melby, A. (1999) 'Machine Translation and Philosophy of Language', in Machine Translation Review, No. 9, April 1999, pp. 6-17

Murphy, D. (2000) 'Keeping Translation Technology under Control', in Machine Translation Review, No. 11, December 2000, pp. 7-10

Ney, H., Nießen, S. Och, F.J., Sawaf, H., Tillmann, C., and Vogel, S (2000) 'Algorithms for Statistical Translation of Spoken Language', in IEEE Transactions on Speech and Audio Processing, Vol. 8, No. 1, January 2000, pp. 24-36

Nyberg, E., Mitamura, T., Carbonell, J. The KANT Machine Translation System: From R&D to Initial Deployment, Center for Machine Translation, Carnegie Mellon University, Pittsburgh, USA

Rico, C. (undated) From Novelty to Ubiquity: Computers and Translation at the Close of Industrial Age, http://www.accurapid.com/journal/15mt2.htm.

T. Mitamura (undated) Controlled Languages for Multilingual Machine Translation, Center for Machine Translation, Carnegie Mellon University, Pittsburgh, USA

T. Mitamura, Nyberg, E. and Carbonell, J. (undated) An Efficient Interlingua Translation System for Multi-Lingual Document Productionø Center for Machine Translation, Carnegie Mellon University, Pittsburgh, USA

Waibel, A., Geutner, P., Tomokiyo, et al (2000) 'Multilinguality in Speech and Spoken Language Systems', in Proceedings of the IEEE, Vol. 88, No. 8, August 2000, pp. 1297-1313

Zue, V.W. and Glass, J.R. (2000) 'Conversational Interfaces: Advances and Challenges', in Proceedings of the IEEE, Vol. 88, No. 8, August 2000, pp. 1166-1180

 

 

BCS MACHINE
TRANSLATION
SITE NAVIGATOR
Click for...
BCS MT
home
page
top
 
end
  overview  •  next item  •  previous item