BCS MACHINE
TRANSLATION
SITE NAVIGATOR
Click for...
BCS MT
home
page
top
 
end
  overview  •  next item  •  previous item

British Computer Society's coat of arms 

 

British Computer Society
Natural Language Translation Specialist Group

Web-site: http://www.bcs.org.uk/siggroup/sg37.htm

Machine Translation Review
Issue No. 11, December 2000 - pages pp-pp
ISSN 1358-8346

This page URL: http://www.bcs.org.uk/siggroup/nalatran/mtreview/mtr-11/mtr-11-10.htm
Size: 3 A4 pages when printed

 
State and Role of Machine Translation in India

by Dr. Sivaji Bandyopadhyay
Computer Science & Engineering Department, Jadavpur University, Calcutta - 700 032, India. E-mail:
sivaji_ju@vsnl.com.

 

In a large multi-lingual society like India, there is a great demand for translation of documents from one language to another. Most of the state governments work in the respective regional languages whereas the Union Government's official documents and reports are in bilingual form (Hindi/English). In order to have a proper communication there is a need to translate these reports and documents in the respective regional languages. With the limitations of human translators most of this information ( reports and documents ) is missing and not percolating down. A machine assisted translation system or a translator's workstation would increase the efficiency of the human translators.

The Ministry of Information Technology, Government of India, ( http://www.mit.gov.in) has identified the following domains for development of domain specific translation systems : government administrative procedures and formats, parliamentary questions and answers, pharmaceutical information, legal terminology and important judgements, and so on. The Ministry initiated the TDIL (Technology Development for Indian languages ) project in 1990-91 to support R&D efforts in the area of Information processing in Indian languages covering machine translation among others.

A machine aided translation system (Anusaaraka) among Indian languages has been built with funding from TDIL project. The Anusaaraka system presents an image of the source text in a language close to the target language. In the image, some construction of the source language (which do not have equivalences in the target language) spill over to the output. Anusaarakas has been built for five pairs of languages : Telugu, Kannada, Marathi, Bengali and Punjabi to Hindi. They are available for use through e-mail servers. Anusaarakas follows the principle of substitutibility and reversibility of strings produced. This implies preservation of information while going from a source language to a target language. For narrow subject areas, specialized modules can be built by putting subject domain knowledge into the system, which produces good quality grammatical output. However, it should be remembered that such modules will work only in narrow areas, and will sometimes go wrong. In such a situation, Anusaaraka output will still remain useful. Work is going on in building an English to Hindi Anusaaraka system, which will be a test of building a system between two languages which are far apart. The system so developed will be available as free open-source software under GPL. The work on the Anusaaraka project started at the Indian Institute of Technology, Kanpur. It is now being carried out at the Language Technologies Research Center, Indian Institute of Information Technology, Hyderabad (http://www.iiit.net/research/ltrc) with financial support from Satyam Computers Private Limited. The group at the center is being guided by Prof. Rajeev Sangal and can be contacted at the e-mail address: rambabu@iiit.net.

The Natural Language group of the Knowledge Based Computer Systems (KBCS) division at the National Centre for Software Technology (NCST), Mumbai is working on MaTra, a human-aided transfer-based translation system for English to Hindi. The work is supported under the TDIL project. The domain being explored is news, but the approach is applicable to any domain. The system breaks an English sentence into chunks, analyzes the structure and displays it using an intuitive browser-like representation, which the user can verify and correct, after which the system generates the Hindi. Ambiguities are resolved as they occur by checking with the user. A prototype that can translate simple (single verb group) sentences with occasional human intervention has been developed. A document categorization system has been developed that can automatically classify a news item into one of a set of ten categories, based on a statistical model arrived at after training with a manually created training set. The group is currently developing a practical framework for the syntactic transfer of compound-complex sentences from English to Hindi. More information about the system can obtained from the NCST Website (http://www.ncst.ernet.in/kbcs/NLP.html) and by e-mail from Durgesh Rao, MaTra Coordinator (matra@ncst.ernet.in).

The Machine Aided Translation System ANGLABHARATI, for translation from English to Hindi for the specific domain of Public Health Campaign has been developed and is being installed at the user(s) sites for field testing. This technology is proposed to be extended to another domain for translation of Financial and Supplementary Rules of Government of India and related correspondence. The ANGLABHARATI project was launched by Professor R. M. K. Sinha at the Indian Institute of Technology, Kanpur in 1991 for machine aided translation from English to Indian languages.

The Multilingual Pocket Translator Design Project was undertaken by the Center for Development of Advanced Computing (CDAC) with an view to foreign travellers visiting India. There is a certain type of fixed requirement when a person is travelling. These requirements are categorized, which can help the foreigner to communicate and prevents the system failing in the initial stages. The same pocket translator is useful when a person moves from one state to another within India. Further details about the Pocket Translator can be obtained from http://vishwabharat.tdil.gov.in/pocket_trans.htm and also from the CDAC website http://www.cdac.org.in. Work is also going on at the CDAC for the development of an English-Hindi machine translation system using Tree Adjoining Grammar.

Work on a knowledge-driven generalized example-based Machine Translation system from English to Indian languages is being carried out in the ANUBAAD Project at the Computer Science and Engineering Department, Jadavpur University, Calcutta. It is currently translating short single paragraph news items from English to Bengali. Headlines are translated using knowledge bases and example structures. The sentences in the news body are translated by analysis and synthesis. Semantic categories are associated with words to identify the inflections to be attached in the target language and to identify the context in the sentence. Context identification is also done by using context templates for each word. The example base includes the mapping of grammatical phrases from the source to the target language. The methodologies can be used for developing similar systems for other Indian languages. The work is being carried out as part of a Research Award granted to Dr. Sivaji Bandyopadhyay (e-mail : sivaji_ju@vsnl.com ) and by the University Grants Commission (UGC) of the Government of India in 1999 (F. 30-95/98(SA-III)).

 

 

BCS MACHINE
TRANSLATION
SITE NAVIGATOR
Click for...
BCS MT
home
page
top
 
end
  overview  •  next item  •  previous item