NATURAL LANGUAGE
TRANSLATION
SITE NAVIGATOR
Click for ...
  MTR main menu   MTR-14 overview

  previous web-page   next web-page   •   print MTR   buy MTR

NLT
home page
BCS
home
page
search
engine

click for British Computer Society home-page The British Computer Society
Natural Language Translation Specialist Group

http://www.bcs.org.uk/siggroup/sg37.htm

Machine Translation Review
PAGES 17-28

No. 14, December 2003   ISSN: 1358-8346
http://www.bcs.org.uk/siggroup/nalatran/mtreview/mtr-14/6.htm


 
Blitz Latin - A Machine Translator for Latin to English

by William A. Whitaker (McLean, VA, U.S.A.) and John F. White (Wokingham, U.K.)
 

Abstract
 
We have created a machine translator Blitz Latin for the direct translation of Latin texts to English, combining a 32,000+ word electronic dictionary and a novel programming approach derived from computer chess algorithms. Translation is by stages on a sentence-by-sentence basis. The result is an extremely fast translator that can consequently bear a huge number of translation heuristics. The product has been available commercially since July 2001. Programming principles and retail experience are outlined. In particular, we describe the difficulties of translation caused by the highly ambiguous Latin language, relative to its modern derived languages.
 
 
Introduction
 
The difficulties of machine translation of natural languages from one to another have been very well documented, not least within the pages and articles of the BCS Natural Languages web-site (www.bcs.org.uk/siggroup/nalatran/). A principal, common, problem is that caused by the translating program’s inability to comprehend the widespread general knowledge employed by the speakers of a language in order to separate alternative meanings.
 
The question is also asked: who will employ such machine translators, when their standard of translation is inadequate for a professional translator of the language, and often too garbled for a casual reader? However, such translators have in practice been found to be of value to the former as an aid to translation, and to the latter as a means to grasp the gist of an otherwise unreadable text. One of the problems which plague those who translate Latin is the number of non-Latin speakers who request a ‘quick translation’ of a short motto or inscription - but never for payment. Cumulatively, these people become a serious burden for professional translators, whose web-sites generally try to discourage such approaches.
 
There is also a third group of potential beneficiaries. During years 2000-2001 one of us (JFW) was engaged in reading background Latin material for a book about a little-known Roman emperor. While English translations were available for many of the original sources, for some only the Latin could be obtained. JFW had learned (and received a qualification in) Latin about 32 years prior to this research, but through lack of use had forgotten most of the Latin vocabulary. Unexpectedly, though, he could remember the grammar. Thus translation of the Latin texts required the continual looking-up of words in a paper Latin dictionary - always tiresome and almost impossible if the stem happened to be irregular.
 
It appeared that machine translation would provide a better option, but it proved to be difficult to find a satisfactory commercial product. One amateur offering was no longer supported; another was far too slow (20 minutes at 166 MHz to translate one long paragraph), and the result was both bug-ridden and inadequate. The only professional offering was as part of a "Universal Translator", where the size of the Latin dictionary in Kbytes was about 5% of the size of the French and German dictionaries, and the product could not even find all the words for ‘rex amat reginam’ (the king loves the queen). As such it was useless, and the manufacturer withdrew the claim to translate Latin after the inevitable complaint. We may mention in passing the academic research tool ‘Brutus’ program developed by Nottingham University, UK (1), which apparently also has a deficient vocabulary.
 
However, an American ex-military scientist/ computer programmer (WAW) had produced, as a hobby over seven years, a very well-organised Latin to English electronic dictionary with separate tables of stems and inflections that would enable rapid construction of Latin words. The dictionary at that time amounted to 28,000 unique Latin words, and therefore already exceeded all but the most monumental of paper Latin dictionaries for size. The electronic dictionary additionally provided extra information about each Latin stem: in which Area it is used (general, ecclesiastical, legal, military, biological, agricultural, dramatic or poetical, scientific or technological); in which Age it was predominantly used (general, antiquity, classical era, post-classical, medieval, post-medieval, modern); and with what Frequency the stem is cited in conventional dictionaries (from very common to extremely rare).
 
Example of ars, artis, fem, from WAW’s electronic dictionary:
 
ars art N 3 3 F T X X X A O skill/craft/art; trick, wile; science, knowledge; method, way; character;
 
We joined forces in early 2001 with a view to creating a superior Latin translator. JFW previously had acquired skills in programming Artificial Intelligence (AI), honed largely on computer chess and neural networks. A characteristic of computer chess is that speed of attaining the correct move from a position dominates virtually all other considerations, so that the programmer acquires the mental set of writing programs that execute extremely fast. There would be no waiting 20 minutes to translate single paragraphs with even the slowest of computers.
 
 
The Latin Language
 
The ancient Latin language has two principal features that distinguish it from most modern European languages:
 
1. It is an inflected language, so that the ending of each word provides a crucial part of the meaning. Thus the word ‘dominus’ means ‘the lord’ (singular, nominative case), whereas ‘domini’ can mean ‘of the lord’ or ‘the lords’ (plural, nominative or vocative).
 
2. It is a massively over-loaded language, where simple words can have several unrelated meanings. A good example is ‘plaga’, which can mean ‘snare’, ‘blow’ or ‘tract of land’ (partly separated by pronunciation, which cannot be distinguished when the word is written). A serious later complication was the medieval word ‘plagis’, meaning a certain type of musical chord. All these variants of plaga and plagis share some common inflections. ‘Plagis’ might mean ‘to the snares/blows/tracts’, ‘with the snares/blows/tracts’, ‘O chord’, ‘the chord’ or ‘of the chord’. How is a translator to pick between the nine alternatives?
 
All languages have ambiguity of meaning for some words (eg English has ‘right’, which can mean ‘not left’ or ‘not wrong’, both ultimately derived from Roman superstition about the left way being the unlucky way), but Latin is particularly well endowed with this translator’s problem. Latin is, indeed, a much harder language to translate into English than the modern west-European languages that are derived from it, such as French, Spanish and Italian. In fact, inflected Latin is so ambiguous that the spoken form (‘Vulgar Latin’) is known to have added many prepositions to clarify the structure and meaning (2), while its derivative languages abandoned inflections altogether. Prepositions were also added to the modern German language, which is lightly inflected. Latin was actually a retrograde step from the older Greek language (from which Latin derived very many words), which, although also inflected, at least had a definite article (‘the’). Latin cannot easily distinguish between ‘the king loves the queen’ and ‘a king loves a queen’.
 
Any book about Latin translation tells the reader to identify the nominative (subject) noun in the sentence, then pick out the verb which matches the noun. You do not find this instruction in books about the translation of modern European languages! In fact, modern European languages may generally be translated word-for-word into English, while retaining their original sense. This is certainly not the case with Latin, where the word order is used for emphasis. Examples:
 
English: ‘against the followers of Christ’ (a common theme in the early Christian Latin literature).
 
Italian: ‘contro i discepoli di Christo.’
 
Latin: ‘contra Christi cultores.’
 
Note that the Latin sentence has fewer words and their order is different from the Italian/English. Further, ‘cultores’ can mean ‘inhabitants, cultivators, supporters’.
 
 
Blitz Latin
 
We first created our machine translator "Blitz Latin" over several months in 2001. It was written with Microsoft’s Visual C++, and is a standard application with a GUI interface designed for use with Microsoft’s Windows 95TM or later variant running on an IBM™-compatible PC. There is no Unix or Apple Macintosh™ version. The current release of the program requires about 4.5 Mbytes of RAM and some 12 Mbytes of hard-disk space. The user can type in Latin text for translation, or can load it as a pre-typed or computer-scanned text file. Note that Blitz Latin cannot translate from English to Latin, for which there appears to be very little demand.
 
Since Blitz Latin’s translations often are ambiguous, or key Latin words such as plaga may have context-sensitive meanings, Blitz Latin has an interactive editing mode. The user can find suggested alternatives just by clicking on any Latin word, over-type corrections and save the result. The Latin dictionary may also be consulted from the on-screen menu. Words missing altogether from the dictionary may be added by a knowledgeable Latin user to a User File, which is a simple text file with pre-worked examples.
 
The huge dictionary and (selectable) grammatical detail provided by Blitz Latin for single Latin words make the program potentially of value to professional translators. We added a unique Latin spell-checker to the program, after finally becoming exasperated with the number of mis-spellings or scan-errors in published Latin texts. Previously there was no basic Latin spell-checker to match those commonly found in conventional word processors. Professional translators or proof readers can use a different searching arrangement to isolate multiple errors from many Latin files into a single output file for later examination. The program has also the ability to search many texts for Latin words as text, or even as identified stems and inflections.
 
Blitz Latin screen view
Figure 1 - Screen View of Blitz Latin

 
Computer chess has been described as the ‘drosophila fly’ of Artificial Intelligence. One of the great discoveries of the 1970s to 1980s was that attention to acceleration of the move generator and the deep search routines produced far better programs than those where attention was paid to improving chess knowledge. Whisper it softly, but the best chess programs use the least input from chess grandmasters.
 
Machine translators traditionally have brought together a skilled linguist, who knows no computing, and a general programmer who knows little of natural languages. We felt that a novel approach based on the above discovery from computer chess might fare better: we used AI techniques from computer chess to make the fastest possible substitutions of English words for Latin words. This exploratory program, created in Microsoft’s MS-DOS, was so fast that measurement of its speed became worthless. The time of word substitution depended heavily on the program’s rate of access to the hard-drive, the screen and other system resources. The program compiled tables of alternative meanings (from combinations of stems and inflections) for each individual Latin word in a sentence.
 
Subsequent application of simple grammatical rules from Latin now eliminated completely many alternative meanings from the table for each ambiguous candidate Latin word. Then we added a few basic rules for which points were scored, and the best scoring alternative for each original Latin word was taken as the correct choice. Now we sought to broaden the program’s comprehension with ‘Reviews’. The best choice for each original Latin word was reviewed to see if it made a rule-based sense. For example, if the sentence contained a nominative noun, an accusative noun, but no verb, the score weightings would be altered to favour a verb and the scoring evaluation repeated. In theory this re-assessment might go on indefinitely, following some kind of converging algorithm, but in practice little extra benefit is seen after just three reviews.
 
And now we come to the deepest irony: Because Blitz Latin was still executing its translations at a speed far beyond what was required, we felt able to add more and more translation heuristics to the scoring algorithms. This improved the quality of the selection of the best candidate meaning for each original Latin word to the extent that the number of necessary reviews fell. The average number of reviews over tens of thousands of clauses is now very close to 1.1, reducing the time spent on multiple searching of the scoring algorithms. Thus the net effect of Blitz Latin’s speed is that it now has more translation heuristics than slower programs, with an associated improved quality of translation! And thus, too, we have discovered that speed alone does not paper over all of a machine translator’s defects - knowledge is required also.
 
Like many machine translators, Blitz Latin carries out translations on single sentences. We spent some effort trying to ‘remember’ information from one sentence to another, so that a 3rd person verb could be tied to its subject matter, and elementary books of Latin Grammar even state that there is a rule: ‘in Latin, the subject is continued from the previous sentence unless there is clear indication otherwise’ (3). Unfortunately, the ‘subject’ is not necessarily the Latin word with the nominative case in the preceding sentence, and this rule is heuristically worthless for real Latin sentences, as we soon discovered.
 
Blitz Latin now carries out automatic translation in several stages:
 
1. ‘Load-Text’. Delineation of sentences and sub-clauses. Intelligent tidying-up is done at this stage, checking for punctuation. An intermediary text file is created, which replaces the user’s input file in all the following stages.
 
2. ‘Parser’. Construction of tables for each word in a sentence. One original word may have several Latin stems (e.g. nouns, pronouns, verbs, adjectives) each with several inflections modifying the meanings. All the possible combinations are stored for each original word provided by the user. If the full translation option has not been selected, this is the output that will be displayed.
 
3. ‘Clear-out’. Removal of improbable words or meanings using grammatical principles.
 
4. ‘AI-Select’. Use of AI heuristics to determine which surviving meanings for a word are the most probable. The most probable combination of stem and inflection is selected. Reviews are carried out to determine whether the results are satisfactory. If not, the weightings are adjusted intelligently and the clause re-examined.
 
5. ‘Elaborate’. Use of the best meaning from the best word’s translation and its inflection. For example, a 2nd person future of the verb ‘amare’ will be constructed as ‘you will love’.
 
6. ‘Polish’. Use of look-up tables to polish dis-jointed meanings as far as possible.
 
7. ‘Best Order’. Analysis of the best polished meanings to improve the word order from Latin sequence to English sequence. This can make a big difference to comprehension of the final output.
 
8. ‘Output’. Output of the improved translation to a text file which is subsequently displayed on screen.
 
While the speed of automatic translation naturally depends on the CPU speed, translation of complete text files (such as book chapters) is completed in at most seconds. This speed is achieved in part by a very efficient search algorithm for Latin stems and inflections within the electronic dictionary and, to a lesser extent, by use of high speed hash techniques.
 
Tables are constructed for each possible combination of stem and inflection for every individual Latin word in a sentence. Frequently, the table for a single word is a long one. For example, the very common word multum has 10 different entries. Decreta has 15 entries! These are not artificial examples: both words have been taken from Blitz Latin’s standard test, a short authentic historical Latin text extract provided with the translator, that illustrates many points for the novice user (see Figure 1). While we have no direct knowledge of the average number of entries required to store alternative English meanings for a single word of Italian, German, French or Spanish, we would be surprised indeed if more than three entries are often encountered. These are ‘rich’ languages, where single words commonly have single meanings and there are no complicating inflections.
 
We found that storage of complete tables for individual Latin words conferred some speed advantage (up to 10%) over direct indexing into the electronic dictionary, but required an enormous outlay of RAM (or, worse, of slow hard-drive space) to accommodate the tables. This would not be feasible for many users with older computers. However, small hash tables in RAM are particularly effective at handling the rarer occurrences of proper names and other words not previously found in the dictionary. These words would otherwise have been fruitlessly sought and re-sought in the main dictionary, and then examined by slow use of ‘tricks’ (see below), whenever encountered. It was particularly interesting to discover that, when proper names are first encountered, they tend to be re-encountered very quickly in succeeding sentences and paragraphs. By contrast, words completely unknown to the dictionary - even when not typing or scanning errors - tended not to recur.
 
In order to test the performance of Blitz Latin, we accumulated as many test files as we could. There are several excellent web-sites from which Latin HTML or TXT files may be freely downloaded. For example:
http://www.theLatinLibrary.com and
http://www.fh_augsburg.de/~harsch/a_chron.html).
 
At present our files are loosely divided as follows:
 
Test FilesMillion words of Latin
Classical Latin5.7 (includes Vulgate Latin Bible)
Justinian Legal Latin1.7 (6th Century)
Medieval Latin3.6 (includes some modern Latin)
Bracton Medieval Law0.6 (13th Century)
Medieval Music theory1.0 (3rd-11th Centuries)
PHI CD-ROM #5.37.5 (all Latin texts to 200 AD, plus some later)
Total:20+ million words

 
Note: there is a large overlap between the PHI texts (tested courtesy of Packard Humanities Institute, USA) and the ‘Classical Latin’ test files.
 
Blitz Latin translates ALL the above test files within 25 minutes (1.4 GHz CPU PC), or within 4½ hours on a six-year-old 166 MHz PC. The program is exceptionally robust, handling not only true Latin text, but also a mish-mash of scanner errors, typing mistakes, Greek words, English or other foreign part-translations, as well as test rubbish like code files and even binary files disguised as text files. Naturally, it is only the Latin that gets translated.
 
 
Blitz Latin’s Electronic Dictionary
 
For a non-inflected language such as English, it does not matter too much if a single word is missing from the translator’s vocabulary. If we write ‘a long black car was driven down the road’, then striking out as unknown any single word in the above sentence does not affect the meaning of the remaining words.
 
For an inflected language, the consequences of not knowing a single word can be far more serious, and may cause mis-interpretation of the inflections on other words. Thus Blitz Latin makes every effort to find words in its dictionary, and maintenance of the dictionary is arguably the single most important part of machine translation of the Latin language.
 
For a few, very rare, Latin words, we know all the grammatical structure but cannot find the meaning in a conventional Latin dictionary. These words are nevertheless added to the electronic dictionary, so that the translator can at least get the context of use of the word in order to enable correct translation of the surrounding words. The meaning of the missing word is set equal to its stem in upper case, for example ‘trimatus, trimatus, masc’ (TRIMAT).
 
When Blitz Latin cannot find a word in its vocabulary, it then tries to create ‘slurred’ words, eg ‘inmitis’ will be slurred to give ‘immitis’, in the dictionary. Failing here, the translator will attempt to create synthetic words from a similar dictionary stem and a prefix or a suffix. For example, ‘superstruo’ will be created from ‘super-’ + struo (in the dictionary), or ‘tralationem’ from tralat (in the dictionary) and ‘-ionem’. These ‘Tricks’ (as we call them; others use the word ‘morphing’) mop up many of the Latin words not found in the electronic dictionary. Since tricks are slow to implement, we try to identify Latin words commonly translated as a result of their use, and add them to the dictionary.
 
We have also the option to enable Blitz Latin to spew out into a special file all words that it does not recognise. These failed words are then automatically processed to collate them and sort them into descending numerical order. Then we identify the most common ‘unknown’ words and add them to the electronic dictionary. We may therefore claim that no word, in all of our 20 million test words, will be described as ‘unknown’ by the electronic translator if it occurs more than 15 times altogether (most words occurring fewer than 15 times will also be translated). Proper names, abbreviations and words of foreign origin are excepted.
 
Thus the dictionary continues to grow, and our program continues to become faster.
 
 
The Super-Adjective
 
Generally speaking, Blitz Latin uses the stem-types (noun, pronoun, adjective, verb and so on) as presented to it by WAW’s original electronic dictionary. The dictionary distinguishes - whenever known from actual example - the substantive use of an adjective to give a noun; thus bonus is entered in the dictionary as an adjective (‘good’) and as a noun (‘good man’). However, it became necessary to introduce an artificial stem-type, ‘Super-Adjective’, into the dictionary to describe certain adjectives that are declined as adjectives but used as pronouns. Examples include qualis, talis, plerus, multus, malus and alius. This differentiation is, we believe, novel, and was not made lightly. The change was found to be necessary to render more accurate Blitz Latin’s grammatical eliminations of improbable word candidates, and is unlikely to be applicable outside the Latin language.
 
This is an example of how the lightning speed of Blitz Latin, which enables sweeps of thousands of Latin files to be made for items of interest within minutes, has altered our understanding of how the Latin language should be translated, as well as causing changes to be made in the electronic dictionary. A more common occurrence is that inflections are found for a Latin stem that are inconsistent with those previously reported. This also has to be corrected in the dictionary.
 
 
Medieval Latin
 
As the Roman world fragmented under the onslaught of barbarians in the 5th Century AD, the dialects of the inhabitants around the Roman Empire diverged, ultimately leading to the modern west-European languages. However, during the ‘Middle Ages’ (defined by Gibbon as the period from the Fall of Rome in 476 AD to the Fall of Constantinople in 1453 AD) monks and scribes wrote extensively their own dialects in Latin-as-it-sounds. To take just one example, the word listed in Lewis and Short’s Latin Dictionary as synemmenon is variously spelled as synnemenon, synemenon, sinemenon and sinemmenon. Worse, in Britain and perhaps elsewhere, Norman writers turned their everyday French-Saxon speech into Latin by the dubious expedient of placing Latin inflections onto their native words. The results must often have been incomprehensible even to readers of Latin in the next town, let alone to readers in Italy.
 
Medieval Latin thus introduced its own ambiguities. The most senior Norman feudal overlords were known as ‘barons’. Norman scribes invented a new Latin word ‘baro, baronis, masc’, meaning ‘baron’, apparently oblivious to the fact that there was already a perfectly good, identical, Latin word that meant ‘blockhead’. Thus important late medieval Latin legal documents, such as the 13th Century ‘Magna Carta’ (the famous pact between an English king and his barons), are filled with references to the king’s loyal blockheads. It was not until the Enlightenment (around the 15th-16th Centuries AD) that the spelling and use of Latin was again standardised.
 
How should Blitz Latin respond to this challenge, bearing in mind that far more Latin survives from the medieval period than from the Classical Age? We have made pragmatic changes. A large number of common medieval Latin words have gone into the electronic dictionary. We have created medieval ‘tricks’ to compensate for the most common medieval mis-spellings. And, most ambitiously, we have created phonetics lists, which try to match a phonetically mis-spelled medieval Latin stem (ie, excluding the inflection) with its classical spelling. The last procedure is quite effective, but rather slow. It has three weaknesses:
 
1. Sometimes there are several matches with a phonetically mis-spelled medieval Latin word, and it is not clear which match should be used. Thus Blitz Latin now keeps a short-list of the most promising (defined heuristically) match options and tries them in turn until both candidate stem and its inflection (computed by subtraction of the candidate stem from the original Latin word) are accepted by the dictionary as a legitimate Latin word.
 
2. There is no systematic way to search for phonetic Latin mis-spellings in any dictionary. In order to save an excruciating search of the entire electronic dictionary, Blitz Latin selects (currently) up to four areas of the dictionary for systematic search. This is fast, but sometimes a phonetic match elsewhere in the dictionary will be missed.
 
3. There is the serious problem that the spell-it-as-I-speak-it writer might have a pronounced local accent. We have recently added the ‘Historia Francorum’ of St. Gregory of Tours (6th Century) to our test files. This writer, descended of a Romano-Gallo family but living in a Frank-controlled area surrounded by civil strife, has persistently, but not consistently, mis-spelled many Latin words. For example:
 
victoria, cognosc-, territorium, oratorius ⇒ victuria, cognusc-, territurium, oraturius.
 
monasterium, religiosus, itinere, obsidere ⇒ monastirium, relegiosus, itenere, obsedere.
 
And those are just counting the most common mis-spellings. It would appear that the author has altered those vowels on which a stress is placed in speech. Blitz Latin now contains an adaption that will test changes to stressed vowels as part of its phonetics search. This is again rather slow, so that the medieval phonetics search has to be actively engaged by the user.
 
 
Modern Latin
 
More Latin has been written in the past 50 years than in all previous history. Two main reasons may be put forward: the (Latin-speaking) bureaucracy known as ‘The Vatican’ and Internet chat rooms. This created a demand for new words for such inventions as the motor car and the aeroplane. Dictionaries of such modern Latin words have been compiled, and one such dictionary is the ‘Calepinus Novus’, compiled by the Belgian ‘Melissa Foundation’. An electronic version of the Calepinus Novus dictionary, adapted by the authors, is now incorporated into Blitz Latin, by generous permission of Guy Licoppe of Melissa (guy.licoppe@pophost.eunet.be).
 
Blitz Latin thus covers all bases with its electronic dictionary. It is arguably the fullest (i.e., considering number of words and time spread) Latin dictionary in existence, although the Oxford Latin Dictionary (£225) and Lewis and Short’s Latin Dictionary (£100) retain their superiority for classical Latin (for now...). An unexpected by-product of our trials with Blitz Latin with so many Latin test files has been the ability to increase accurately and aggressively the number of words now inserted into the dictionary, after they had previously been rejected too frequently as ‘unknown’.
 
 
Standard of Latin Translation and Resolution of Ambiguities
 
We should like to be able to say that the standard of translation by Blitz Latin is perfect, but clearly that is not possible for such an ambiguous language as Latin. Nevertheless, the quality of Latin translation is easily good enough to follow the gist of Latin texts. JFW has been able to translate his difficult historical and legal Latin texts referred to at the beginning of this article.
 
Often use of an ambiguous Latin stem can be resolved on grammatical principles (whether the word should be a noun, adjective, verb or other) but, if this is not possible, the translator will use the most-cited word (see ‘Frequency’, previously). We have found it sometimes necessary to change the frequency values originally allocated, on the basis of our experiences in translating Latin texts. For example, the stem ‘mult-’ is cited in standard dictionaries as very frequent whether encountered as multus (adjective, ‘much’) or as multa (noun, ‘penalty’). In practice, the stem ‘mult-’ occurs far more often as part of multus.
 
As a consequence of the ambiguity of Latin, Blitz Latin frequently provides sentences that have been translated with grammatical accuracy, but with the wrong meanings. An extreme example of this can be seen with the very common, two-word, clause: ‘liber primus’. This phrase is placed at the head of a multi-volume Latin text, and means ‘The First Book’. However, it can be translated with equal accuracy as ‘The Free Leader’ or ‘The Free First-Man’. And why should it not be so translated? The only reason we prefer the first translation ‘The First Book’ is because we happen to know that the next book begins ‘liber secundus’ - ‘The Second Book’. But the first book could have been a biography of the freed South African president, Nelson Mandela.
 
A related difficulty can be seen in a short extract from the 19th Century Latin translation of the Greek text of the Byzantine historian Zonaras: ‘ad fossam quamdam’. The intended translation is ‘at a certain ditch’. Blitz Latin’s translation is ‘to the dug certain’. This is, in fact, a legal translation, despite the confusion as to whether ‘fossam’ should be translated as a noun qualified by an adjective, or as a verb participle qualifying a pronoun (note that ‘ad’ is yet another overloaded Latin word).
 
A further difficult example of ambiguity is found with Blitz Latin’s test file:
 
Qui principi imperii Christianis clementem se praebuit, ...’. Blitz Latin translates this sentence as ‘Which to the prince of the command with Christians has presented the merciful himself, …’ This translation seems to be perfectly good, since we know from other Latin sources that the emperor (‘which’) initially enjoyed good relations with the Pope, who might well be described as the ‘princeps’ of the Christian Command by the original pagan writer. Indeed, we believed for some time that this was the correct translation. But ‘principi’ ought to be translated as ‘of the beginning’. This alters the whole meaning of the sentence: ‘Which of the beginning of the (his) command to Christians has presented the merciful himself’. Since this ambiguity fooled even us, it is clear that we shall have difficulty getting the translator to make the correct translation.
 
We have suggested elsewhere (see reference 4) that some of the ambiguities for individual Latin words might be cleared up by training of a neural network by human Latin experts, employing many example Latin sentences containing the ambiguous Latin word. The correct meaning could be assigned for each ambiguous Latin word by the experts, thus providing the training. This process would be time-consuming for each word trained, and would then require the addition of a great deal of complex code to Blitz Latin, if the latter were to make use of the results. There is also a faint hope that perhaps rules might become apparent for the deployment of each ambiguous Latin word by this process, so thatBlitz Latincould use simple rules instead of a neural network whose parameters had to be changed for each trained Latin word.
 
The present authors, who have other responsibilities, lack the requisite time to investigate this further. But certainly we would be interested to know whether the method will be successful in principle, if tested with perhaps 10-20 very ambiguous Latin words (such as plaga). Such an investigation - requiring training of a neural network for each separate ambiguous Latin word - is a job for a PhD student.
 
Another potential means to resolve the ambiguities of Latin stems is to consider the area (or field) in which the Latin word is used. It may be recalled that the electronic dictionary stores such areas as part of its structure. Frequently only the general meaning of a word is required, but specialist meanings will apply in some sentences. Examples include the Latin words ‘plaga, plagae, fem’ (snare/blow/tract of land) and ‘plagis, plagis, fem’ (music chord), mentioned previously, and ‘ars, artis, fem’ (‘skill’ or ‘wile’, but with scientific meaning ‘science’ or ‘knowledge’).
 
Blitz Latin now allows the user to select the area or field (general, ecclesiastical, legal etc, as listed earlier) before attempting the translation. Even more ambitious are the program’s attempts to auto-change the area according to its perception of the Latin content. If Blitz Latin discovers that most of its translation was (say) legal Latin, the user is advised and recommended to re-translate with the manual setting for ‘legal area’.
 
Both of these techniques for selecting the translation area work quite well (surprisingly so, in the case of the auto-changing facility), but there remain serious defects.
 
1. The original writer may have intended that a Latin word retain its general meaning even when used in a specialist text. This is by far the most serious problem.
 
2. The auto-changing translation lags a change in Latin content. If a text changes from a discussion of astronomy (scientific area) to its implication for theology (ecclesiastical area), some 30-50 sentences may pass before Blitz Latin auto-changes its translation area from scientific to ecclesiastical.
 
Finally, we point up that a very ambiguous language is likely to be formula-driven; that is, the native speakers would use standard formulae to express certain phrases. Therefore a very small part (currently) of Blitz Latin is devoted to making standard phrase substitutions, when encountered. Probably this section should be expanded, but it requires identification of suitable, common, target phrases, and requires quite a time overhead to implement. A trivial example is the recognition and translation of the various derivative forms of ‘res publica’ as ‘State’.
 
 
Blitz Latin and 'Latin Grammar'
 
This article has addressed the programming issues of Blitz Latin. For a discussion of the grammatical issues, we refer the reader to our article in the JACT Review (4).
 
After the latter article appeared in print, a reader made a thought-provoking point: we had attended too much to unravelling the intricacies of classical Latin texts, which had been preserved as much for their eloquence and grammatical complexity as for their content. The reader suggested that we should have much less difficulty with simpler Latin texts, such as modern or even medieval texts (after compensating for spelling problems). This is a valid point that other would-be programmers of Latin translators should also bear in mind.
 
 
'Blitz Latin and Latin Inscriptions'
 
Large numbers of Latin inscriptions remain on Roman monuments that have survived through the millennia. These were initially collated by the German, Mommsen, in the 19th Century, but are now available electronically. A good source is Frankfurt University (Germany) at http://www.rz.uni-frankfurt.de/~clauss/index-e.html.
 
Blitz Latin can translate expanded inscriptions (such as those supplied by Frankfurt; the originals are often abbreviations only), after a special toggle is set. The big difficulty is that the translation routines rely rather heavily on finding a verb present. Verbs tend to be absent in most inscriptions, making the problem of ambiguity of translation even worse.
 
 
Experience with 'Blitz Latin' in Service
 
Blitz Latin has been commercially available since July 2001 (version 1.35) via the web-site of the independent retail distributors Software Partners ( http://www.software-partners.co.uk). The down-loaded version comes with an auto-install (and auto-uninstall!) facility and is free for 10 uplifts from the hard-drive into RAM. Thereafter a licence has to be purchased (at present £29 or foreign equivalent). The current version is 1.52, released in May 2003. Existing licence holders are entitled to free upgrades with each new release. To date, there have been five such upgrades.
 
The problem of legal, but inaccurate, translation of ambiguous Latin text has proved to provide by far the most common source of complaint about Blitz Latin. Experienced Latin translators do not seem to understand that their knowledge of Latin depends at least as much on their general knowledge as on their Latin skills. They complain that Blitz Latin’s translation sometimes makes no ‘sense’. What is ‘sense’, and how is a computer program to acquire it?
 
There has not been a single word of complaint about the stability of Blitz Latin. It simply does not crash.
 
 
Summary
 
We have created a lightning-fast machine translator of Latin which we call Blitz Latin. Its high speed of operation has enabled the introduction of a very high number of translation heuristics, so that it is the most accurate commercially-available Latin translator, as well as the fastest.
 
The translator provides a quality of translation sufficient for the gist of the Latin text to be grasped, and to provide a fast aid to professional (human) translators. The ambiguities of the Latin language, far greater than those experienced with modern west-European languages, are such that we believe that it will not be possible to improve significantly on Blitz Latin’s translations without a means of conferring ‘context’ on each word as it is translated. Regrettably, ‘context’ usually means ‘wide general knowledge’, which is not at present accessible to computer programs.
 
 
References
 
1. Bowden, P.R. ‘Latin to English Translation - A Direct Approach’. In Machine Translation Review, No. 12, December 2001
 
2. Columbia Electronic Encyclopaedia. 6th Edition, 2000. The Latin Language. Columbia University Press. [ http://www.encyclopedia.com/articles/07250.html.]
 
3. Collins Latin Dictionary Plus Grammar. Harper-Collins, 1997:134
 
4. Whitaker, W.A. and White, J.F. ‘Blitz Latin - Experiments with automatic translation of Latin’. In JACT Review, No. 32, Autumn 2002: 2-8.
 

 

 

NATURAL LANGUAGE
TRANSLATION
SITE NAVIGATOR
Click for ...
  MTR main menu   MTR-14 overview

  previous web-page   next web-page   •   print MTR   buy MTR

NLT
home page
BCS
home
page
search
engine