MTR menu
overview
previous page    next page
British Computer Society's coat of arms British Computer Society
Natural Language Translation
Specialist Group

URL: http://www.bcs-mt.org.uk/
WEB PAGE 5
Machine Translation Review
No. 1, April 1995   ISSN: 1358-8346
http://www.bcs-mt.org.uk/mtreview/1/5.htm


Multilingual Natural Language Processing (MNLP) Project

by David Wigg

In Newsletter 21 of April 1993 Douglas Clarke suggested that there was a need for some commonly required NLP programs which should be freely available to avoid programmers continually re-inventing the proverbial wheel, and that a sub-group should be set up to see what we could do about it. Some meetings of people interested in supporting this proposal were held and it was decided that:

  • a comprehensive list of MNLP functions should be established first before trying to write NLP applications such as morphological analysis, multilingual word-processing, CALL programs, etc;
  • a specification of the required software should be published anonymously by the Group in the so called Public Domain with copyright held by the BCS;
  • individuals would then be free to write procedures to meet this specification in a variety of languages, which the Group could also publish.
  • these procedures could then be used to write linguistic applications which, being based on virtually universally usable program code and data, could then be used by anyone.

In the last Newsletter (22) I therefore proposed the following project (now expanded in detail) for the Group.
 
 
Common Procedures and Common Files
 
Objectives:

  • design general purpose text processing software functions with a system of integrated data/file structures, so that a single all embracing system for text processing may be constructed;
  • specify and publish the functions, so that the modules could be written independently in a variety of programming languages by anybody interested in doing so;
  • the system should be designed so that program source code could as far as possible be sharable between users of most of the commonly used procedural programming languages, so that users would be able to use their normal language of choice and would not have to learn another computer language when writing linguistic applications;
  • a common file structure should also be defined to enable data files to be written to a common system standard, using techniques similar to SGML or TEI coding, so that expensively constructed files would be available to any other user of this software using any computer language on any computer, without the need for adaptation or conversion;
  • the Group should publish and make available source code at cost of any versions of the software meeting adequate quality standards, so that the software would be freely available, and the reputation of the BCS and the Group would be maintained;
  • the functions and file structures should be easy to use, so that they would be accessible to the widest possible audience and not confined to so-called expert programmers only.

  Benefits:
  • sharing reusable resources allows individuals to add to the work of others enabling resources to be accumulated instead of dissipated;
  • the development of a community of experts with a common 'computer language' and common 'data/file structures';
  • the creation of universally sharable data files;
  • to enable linguistic specialists to develop computerised linguistic applications more easily.

Unfortunately we have not been able to do much work on this project in the last year, mainly due to the work involved in preparing for the MT Conference in November 1995.
 
A sub-group was convened last year from about 50 members interested in developing this project, but with such a widespread membership of the Group only a small minority were able to attend meetings in London.
 
With this in mind I would like to think that any individual, or group, should be able to develop a proposal to meet the objectives outlined above, for the committee to consider.
 
Sub-group meetings should continue to be held with as many people attending as possible to discuss individual proposals but any reasonable proposal should be supported by the Group by paying for some copying and distribution so that others interested in the project but unable to attend committee meetings can offer constructive criticism or, in the extreme, propose an alternative system.
 
You probably know that there is a proposal in existence which has already been circulated in an early form and discussed by a small sub-group.
 
I hope this existing proposal can be further developed but I realise that others may have more time and expertise to produce a better result more quickly, so I suggest that system specifications could be proposed on an individual basis for the Committee to consider. If a developer was a member of the Committee then, of course, he/she would not have a vote involving its acceptance.
 
It might even be possible for more than one system to be accepted if they were sufficiently different and each had substantial advantages of its own.
 
Members of the original group of about 50 should have received an up-to-date version of the existing specification recently with a questionnaire. If you have not received these recently but are interested in the project, please let me know so that I can send them to you and put you on the list for the future.