Jump to content

Machine translation

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Grstain (talk | contribs) at 12:36, 9 January 2005 (rev). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Machine translation (MT) is a form of translation where a computer program analyses the text in one language - the "source text" - and then attempts to produce another, equivalent text in another language - the target text - without human intervention.

Currently the state of machine translation is such that it involves some human intervention, as it requires a pre-editing and a post-editing phase. Note that in machine translation, the translator supports the machine and not the other way around.

Nowadays most machine translation systems produce what is called a "gisting translation" - a rough translation that gives the "gist" of the source text, but is not otherwise usable.

However, in fields with highly limited ranges of vocabulary and simple sentence structure, for example weather reports, machine translation can deliver useful results.

Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains.
Source: www.eamt.org, European Association for Machine Translation, EAMT, 1997.

Machine translation vs. Computer-assisted translation

Although the two concepts are similar, machine translation (MT) should not be confused with computer-assisted translation (CAT) (also known as machine-assisted translation (MAT)).

In machine translation, the translator supports the machine, that is to say that the computer or program translates the text, which is then edited by the translator, whereas in computer-assisted translation, the computer program supports the translator, who translates the text himself, making all the essential decisions involved.

Introduction

The translation process, whether for translation per se or for interpreting, can be stated simply as:

  1. Decoding the meaning of the source text, and
  2. Re-encoding this meaning in the target language.

Behind this simple procedure there lies a complex cognitive operation. For example, to decode the meaning of the source text in its entirety, the translator must interpret and analyse all the features of the text, a process which requires in-depth knowledge of both the grammar, semantics, syntax, idioms and the like of the source language, as well as the culture of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the target language.

Therein lies the challenge in machine translation: how to program a computer to "understand" a text as a human being does and also to "create" a new text in the source language that "sounds" as if it has been written by a human.

This problem can be tackled in a number of ways.

Linguistic approaches

It is often argued that the success of machine translation requires the problem of natural language understanding to be solved first. However, a number of heuristic methods of machine translation are also used, including:

Generally, rule-based methods (the first three) parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. These methods require extensive lexicons with morphologic, syntactic, and semantic information, and large sets of rules.

Statistical-based and example-based methods eschew manual lexicon building and rule-writing and instead try to generate translations based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare.

Given enough data, most machine translation programs work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker (i.e. producing a "gisting translation"). The difficulty is getting enough data of the right kind to support the particular method. The large multilingual corpus of data needed for statistical methods to work is not necessary for the grammar based methods, for example. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.

Users

Despite their inherent limitations, MT programs are currently used by various organizations around the world. Probably the largest institutional user is the European Commission, which uses a highly customized version of the commercial MT system SYSTRAN to handle the automatic translation of a large volume of preliminary drafts of documents for internal use.

It was recently revealed that in April 2003 Microsoft began using a hybrid MT system for the translation of a database of technical support documents from English to Spanish. The system was developed internally by Microsoft's Natural Language Research group. The group is currently testing an English–Japanese system as well as bringing English–French and English–German systems online. The latter two systems use a learned language generation component, whereas the first two have manually developed generation components. The systems were developed and trained using translation memory databases with over a million sentences each.

History of machine translation

The first attempts at machine translation were conducted after World War II. It was assumed at this time that the newly invented computers would have no trouble in translating texts. The logic was that computers were able to do complex mathematics quickly, something that humans did with more difficulty. On the other hand, even young children were able to learn to understand human language, therefore computers could do the same. In actual fact, this belief was soon shown to be incorrect.

On 7 January 1954, the first public demonstration of a MT system was held in New York at the head office of IBM. The demonstration was widely reported in the newspapers and received much public interest. The system itself, however, was no more than what today would be called a "toy" system, having just 250 words and translating just 49 carefully selected Russian sentences into English - mainly in the field of chemistry. Nevertheless it encouraged the view that MT was imminent - and in particular stimulated the financing of MT research, not just in the US but worldwide.

The first serious MT systems were used during the Cold War to parse texts in Russian scientific journals. The rough translations produced were sufficient to understand the "gist" of the articles. If an article discussed a subject deemed to be of security interest, it was sent to a human translator for a complete translation; if not, it was discarded.

The advent of low-cost and more powerful computers towards the end of the 20th century brought MT to the masses, as did the availability of sites on the Internet.

Much of the effort previously spent on MT research, however, has shifted to the development of computer-assisted translation (CAT) systems, such as translation memories, which are seen to be more successful and profitable.

See also

Free (open source) software