Automatic Language Translator

IBM's Automatic Language Translator was a machine translation system that converted Russian documents into English. It used an optical disc that stored 170,000 word-for-word and statement-for-statement translations and a custom computer to look them up at high speed. Built for the US Air Force's Foreign Technology Division, the AN/GSQ-16 (or XW-2), as it was known to the Air Force, was primarily used to convert Soviet technical documents for distribution to western scientists. The translator was installed in 1959, dramatically upgraded in 1964, and was eventually replaced by a mainframe running SYSTRAN in 1970.

History

Photoscopic store

The translator began in a June 1953 contract from the US Navy to the International Telemeter Corporation (ITC) of Los Angeles. This was not for a translation system, but a pure research and development contract for a high-performance photographic online storage medium consisting of small black rectangles embedded in a plastic disk. When the initial contract ran out, what was then the Rome Air Development Center (RADC) took up further funding in 1954 and onwards.^[1]

The system was developed by Gilbert King, chief of engineering at ITC, along with a team that included Louis Ridenour. It evolved into a 16-inch plastic disk with data recorded as a series of microscopic black rectangles or clear spots. Only the outermost 4 inches of the disk were used for storage, which increased the linear speed of the portion being accessed. When the disk spun at 2,400 RPM it had an access speed of about 1 Mbit/sec. In total, the system stored 30 Mbits, making it the highest density online system of its era.^[1]^[a]

Mark I

In 1954 IBM gave an influential demonstration of machine translation, known today as the "Georgetown–IBM experiment". Run on an IBM 704 mainframe, the translation system knew only 250 words of Russian limited to the field of organic chemistry, and only 6 grammar rules for combining them. Nevertheless, the results were extremely promising, and widely reported in the press.^[2]

At the time, most researchers in the nascent machine translation field felt that the major challenge to providing reasonable translations was building a large library, as storage devices of the era were both too small and too slow to be useful in this role.^[3] King felt that the photoscopic store was a natural solution to the problem, and pitched the idea of an automated translation system based on the photostore to the Air Force. RADC proved interested, and provided a research grant in May 1956. At the time, the Air Force also provided a grant to researchers at the University of Washington who were working on the problem of producing an optimal translation dictionary for the project.

King advocated a simple word-for-word approach to translations. He thought that the natural redundancies in language would allow even a poor translation to be understood, and that local context was alone enough to provide reasonable guesses when faced with ambiguous terms. He stated that "the success of the human in achieving a probability of .50 in anticipating the words in a sentence is largely due to his experience and the real meanings of the words already discovered."^[4] In other words, simply translating the words alone would allow a human to effectively read a document, because they would be able to reason out the proper meaning from the context provided by earlier words.

In 1958 King moved to IBM's Thomas J. Watson Research Center, and continued development of the photostore-based translator. Over time, King changed the approach from a pure word-for-word translator to one that stored "stems and endings", which broke words into parts that could be combined back together to form complete words again.^[4]

The first machine, "Mark I", was demonstrated in July 1959 and consisted of a 65,000 word dictionary and a custom tube-based computer to do the lookups.^[3] Texts were hand-copied onto punched cards using custom Cyrillic terminals, and then input into the machine for translation. The results were less than impressive, but were enough to suggest that a larger and faster machine would be a reasonable development. In the meantime, the Mark I was applied to translations of the Soviet newspaper, Pravda. The results continued to be questionable, but King declared it a success, stating in Scientific American that the system was "...found, in an operational evaluation, to be quite useful by the Government."^[3]

Mark II

On 4 October 1957 the USSR launched Sputnik 1, the first artificial satellite. This caused a wave of concern in the US, whose own Project Vanguard was caught flat-footed and then proved to repeatedly fail in spectacular fashion. This embarrassing turn of events led to a huge investment in US science and technology, including the formation of DARPA, NASA and a variety of intelligence efforts that would attempt to avoid being surprised in this fashion again.

After a short period, the intelligence efforts centralized at the Wright-Patterson Air Force Base as the Foreign Technology Division (FTD, now known as the National Air and Space Intelligence Center), run by the Air Force with input from the DIA and other organizations. FTD was tasked with the translation of Soviet and other Warsaw Bloc technical and scientific journals so researchers in the "west" could keep up to date on developments behind the Iron Curtain. Most of these documents were publicly available, but FTD also made a number of one-off translations of other materials upon request.

Assuming there was a shortage of qualified translators, the FTD became extremely interested in King's efforts at IBM. Funding for an upgraded machine was soon forthcoming, and work began on a "Mark II" system based around a transistorized computer with a faster and higher-capacity 10 inch glass-based optical disc spinning at 2,400 RPM. Another addition was an optical character reader provided by the third party, which they hoped would eliminate the time-consuming process of copying the Russian text into machine-readable cards.^[3]

In 1960 the Washington team also joined IBM, bringing their dictionary efforts with them. The dictionary continued to expand as additional storage was made available, reaching 170,000 words and terms by the time it was installed at the FTD. A major software update was also incorporated in the Mark II, which King referred to as "dictionary stuffing". Stuffing was an attempt to deal with the problems of ambiguous words by "stuffing" prefixes onto them from earlier words in the text.^[3] These modified words would match with similarly stuffed words in the dictionary, reducing the number of false positives.

In 1962 King left IBM for Itek, a military contractor in the process of rapidly acquiring new technologies. Development at IBM continued, and the system went fully operational at FTD in February 1964. The system was demonstrated at the 1964 New York World's Fair. The version at the Fair included a 150,000 word dictionary, with about 1/3 of the words in phrases. About 3,500 of these were stored in core memory to improve performance, and an average speed of 20 words per minute was claimed. The results of the carefully selected input text was quite impressive.^[5] After its return to the FTD, it was used continually until 1970, when it was replaced by a machine running SYSTRAN.^[6]

ALPAC Report

In 1964 the United States Department of Defense commissioned the United States National Academy of Sciences (NAS) to prepare a report on the state of machine translation. The NAS formed the "Automatic Language Processing Advisory Committee", or ALPAC, and published their findings in 1966. The report, Language and Machines: Computers in Translation and Linguistics, was highly critical of the existing efforts, demonstrating that the systems were no faster than human translations, while also demonstrating that the supposed lack of translators was in fact a surplus, and as a result of supply and demand issues, human translation was relatively inexpensive – about $6 per 1,000 words. Worse, the FTD was slower as well; tests using physics papers as input demonstrated that the translator was "10 percent less accurate, 21 percent slower, and had a comprehension level 29 percent lower than when he used human translation."^[7]

The ALPAC report was as influential as the Georgetown experiment had been a decade earlier; in the immediate aftermath of its publication, the US government suspended almost all funding for machine translation research.^[8] Ongoing work at IBM and Itek had ended by 1966, leaving the field to the Europeans, who continued development of systems like SYSTRAN and Logos.

Notes

^ These numbers for the early disk systems appear to be inaccurate – another document from the same author suggests that these figures are actually for the later version used on the Mark II translator.

References

Citations

^ ^a ^b Hutchins, pg. 171
^ John Hutchins, "The first public demonstration of machine translation: the Georgetown-IBM system, 7th January 1954" Archived 3 March 2016 at the Wayback Machine
^ ^a ^b ^c ^d ^e Hutchins, pg. 172
^ ^a ^b King, 1956
^ Hutchins, pg. 174
^ Hutchins, pg. 175
^ ALPAC, pg. 20
^ John Hutchins, "ALPAC: the (in)famous report" Archived 6 October 2007 at the Wayback Machine

Bibliography

G. W. King, G. W. Brown and L. N. Ridenour, "Photographic Techniques for Information Storage", Proceedings of the IRE, Volume 41 Issue 10 (October 1953), pp. 1421–1428
G. W. King, "Stochastic Methods of Mechanical Translation", Mechanical Translation, Volume 3 Issue 2 (1956) pp. 38–39
J. L. Craft, E. H. Goldman, W. B. Strohm, "A Table Look-up Machine for Processing of Natural Languages", IBM Journal, July 1961, pp. 192–203
Language Processing Advisory Committee, "Language and Machines: Computers in Translation and Linguistics", National Research Council, 1966 (widely known as the "ALPAC Report")
John Hutchins (ed), "Gilbert W. King and the IBM-USAF Translator", Early Years in Machine Translation, Joh Benjamins, 2000, ISBN 90-272-4586-X (RADC-TDR-62-105)
Charles Bourne and Trudi Bellardo Hahn, "A History of Online Information Services, 1963–1976", MIT Press, 2003, ISBN 0-262-02538-8