Saturday, March 25, 2017

Google Translate: Method of Translation: Wikipedia

    begin quote from:

    Method of translation

    Method of translation

    In April 2006, Google Translate launched with a statistical machine translation engine.[1]
    Google Translate does not apply grammatical rules, since its algorithms are based on statistical analysis rather than traditional rule-based analysis. The system's original creator, Franz Josef Och, has criticized the effectiveness of rule-based algorithms in favor of statistical approaches.[29] It is based on a method called statistical machine translation, and more specifically, on research by Och who won the DARPA contest for speed machine translation in 2003. Och was the head of Google's machine translation group until leaving to join Human Longevity, Inc. in July 2014.[30]
    According to Och, a solid base for developing a usable statistical machine translation system for a new pair of languages from scratch would consist of a bilingual text corpus (or parallel collection) of more than 150-200 million words, and two monolingual corpora each of more than a billion words.[29] Statistical models from these data are then used to translate between those languages.
    To acquire this huge amount of linguistic data, Google used United Nations documents.[31] The UN typically publishes documents in all six official UN languages, which has produced a very large 6-language corpus.
    Google Translate does not translate from one language to another (L1 → L2). Instead, it often translates first to English and then to the target language (L1 → EN → L2).[32]
    When Google Translate generates a translation, it looks for patterns in hundreds of millions of documents to help decide on the best translation. By detecting patterns in documents that have already been translated by human translators, Google Translate makes intelligent guesses as to what an appropriate translation should be.[33]
    Before October 2007, for languages other than Arabic, Chinese and Russian, Google Translate was based on SYSTRAN, a software engine which is still used by several other online translation services such as Babel Fish (now defunct). Since October 2007, Google Translate has used proprietary, in-house technology based on statistical machine translation instead.[34][35]

    Google Neural Machine Translation

    In September 2016, a research team at Google announced the development of the Google Neural Machine Translation system (GNMT) to increase fluency and accuracy in Google Translate[2][5] and in November announced that Google Translate would switch to GNMT.
    Google Translate's new neural machine translation system uses a large end-to-end artificial neural network capable of deep learning.[2][36] GNMT improves the quality of translation because it uses an example based (EBMT) machine translation method in which the system "learns from millions of examples."[36] It translates "whole sentences at a time, rather than just piece by piece. It uses this broader context to help it figure out the most relevant translation, which it then rearranges and adjusts to be more like a human speaking with proper grammar".[2] GNMT's "proposed architecture" of "system learning" was first tested on over a hundred languages supported by Google Translate.[36] With the end-to-end framework, "the system learns over time to create better, more natural translations."[2] The GNMT network is capable of interlingual machine translation, which encodes the "semantics of the sentence rather than simply memorizing phrase-to-phrase translations",[36][37] and the system did not invent its own universal language, but uses "the commonality found inbetween many languages".[38] GNMT was first enabled for eight languages: to and from English and Chinese, French, German, Japanese, Korean, Portuguese, Spanish and Turkish.[2][5]
    GNMT is an improvement on Google Translate in that it is capable of translating directly from one language to another (L1 → L2) instead often first translating to English, for example, and then to the target language (L1 → EN → L2).[37] The GNMT system is "capable of Zero-Shot Translation - translating between a language pair (for example, Japanese to Korean) which the "system has never explicitly seen before."[36] Previously, Google Translate translated to English and then to the target language (L1 → EN → L2) not directly from one language to another (L1 → L2).[37]

    Limitations

    Some languages produce better results than others. Google Translate performs well especially when English is the target language and the source language is from the European Union due to the prominence of translated EU parliament notes. A 2010 analysis indicated that French to English translation is relatively accurate.[39] However, if the source text is shorter, rule-based machine translations often perform better; this effect is particularly evident in Chinese to English translations. While edits of translations may be submitted, in Chinese specifically one is not able to edit sentences as a whole. Instead, one must edit sometimes arbitrary sets of characters, leading to incorrect edits.[39]
    Texts written in the Greek, Devanagari, Cyrillic and Arabic scripts can be transliterated automatically from phonetic equivalents written in the Latin alphabet. The browser version of Google Translate provides the read phonetically option for Japanese to English conversion. The same option is not available on the paid API version.
    Accent of English that the "text-to-speech" audio of Google Translate of each country uses
      British English (female)
      American English (female)
      Oceania accent (female)
      No Google translate service
    Many of the more popular languages have a "text-to-speech" audio function that is able to read back a text in that language, up to a few dozen words or so. In the case of pluricentric languages, the accent depends on the region: for English, in the Americas, most of the Asia-Pacific and West Asia the audio uses a female General American accent, whereas in Europe, Hong Kong, Malaysia, Singapore, Guyana and all other parts of the world a female British English accent is used, except for a special Oceania accent used in Australia, New Zealand and Norfolk Island; for Spanish, in the Americas a Latin American Spanish accent is used, while in the other parts of the world a Castilian Spanish accent is used; Portuguese uses a São Paulo accent in the world, except for Portugal, where their native accent is used. Some less widely spoken languages use the open-source eSpeak synthesizer for their speech.[citation needed]

    Open-source licenses and components

    Language WordNet[40] License
    Albanian Albanet CC-BY 3.0/GPL 3
    Arabic Arabic Wordnet CC-BY-SA 3
    Catalan Multilingual Central Repository CC-BY-3.0
    Chinese Chinese Wordnet Wordnet
    Danish Dannet Wordnet
    English Princeton Wordnet Wordnet
    Finnish FinnWordnet Wordnet
    French WOLF (WOrdnet Libre du Français) CeCILL-C
    Galician Multilingual Central Repository CC-BY-3.0
    Hebrew Hebrew Wordnet Wordnet
    Hindi IIT Bombay Wordnet Indo Wordnet
    Indonesian Wordnet Bahasa MIT
    Italian MultiWordnet CC-BY-3.0
    Japanese Japanese Wordnet Wordnet
    Javanese Javanese Wordnet Wordnet
    Malay Wordnet Bahasa MIT
    Norwegian Norwegian Wordnet Wordnet
    Persian Persian Wordnet Free to Use
    Polish plWordnet Wordnet
    Portuguese OpenWN-PT CC-BY-SA-3.0
    Spanish Multilingual Central Repository CC-BY-3.0
    Thai Thai Wordnet

    No comments: