The concept of Machine translation

The concept of Machine translation (MT) is automated translation or “translation carried out by a computer”, as defined in the Oxford English dictionary. It is a process, sometimes referred to as NLP which uses a bilingual data set and other language assets to build language and phrase models used to translate text 15. Neural networks have recently been applied to machine translation and begun to show promising results. Cho, and Bengio (2014), Vinyals, and Le (2014) and Bahdanau, Sutskever built neural networks to perform end-to-end translation, called neural machine translation (NMT) 8. NMT system contains two elements an encoder and a decoder based on the vector. An encoder part that converts a source sentence into a vector and the decoder that generates target translation.
Executing translation at the subword level (Neubigetal.,2013) or character-level (Vilar et al., 2007) have unsuccessful to yield competitive results compared to word-based counter parts(Brownetal.,1993;Koehnetal.,2007;Chiang,2005), with the exception of closely related languages (Nakov & Tiedemann, 2012) 9. However, developing hybrid models that are capable at reading and generating words at the phrase-level using multiple engine is attractive for many reasons. It opens the possibility forms of models reason about unseen source words, such as morphological variants of observed words. On the other hand, the system permits the creation of hidden target words successfully recasting the translation as open vocabulary task. Lastly, the model bene?t from a signi?cant reduction of the given source and out target vocabulary size as only phrase-pair need to be in the system modelled clearly. The propesd study paper present hybrid neural machine translation English to Amharic text model that learns to encode-decode using at the recurrent neural network phrase based level. This indicates that contrarily to earlier belief models that perform at word level and character level can produce less results compared with phrase based model.
Hybrid Neural Machine translation system is a process which uses neural network techniques to automatically translate text from English language to Amharic language, with the integration of statistical machine translation of best feutures. Specifically, as for many other natural language processing (NLP) tasks, the system uses recurrent neural networks. The main feature they have is that they work on sequences: given an input sequence, they produce an output sequence. English language translation is essential because from the world population only 20% of the world population speak English and from the available Internet resource 50% in English.
2 Motivation
African languages, which contribute around 30% (2139) of the world language highly suffer from the lack of sufficient language resources (Simons and Fennig, 2017). This is true for Amharic languages. There is still a need to share information among citizens who speak different languages. For example, Amharic is the regional language of the Amhara and Southern Nations and Nationalities regions and also its Federal government working language. Therefore, a lot of translation demands among the English document to Amharic language. In order to enable the citizens of the country to use the documents and the information produced in English languages, the documents need to be translated to the languages they understand most. Since manual translation is expensive, a promising alternative is the use of machine translation, particularly Neural Machine translation as Ethiopian languages suffer from lack of basic linguistic resources such as morphological analyser, syntactic analyser, morphological synthesizer, etc. The major and basic resource required for Machine translation is parallel corpora, which are not available populally for Amharic languages. The preparation of hybrid neural machine translation model for Amharic languages is, therefore, an important part to facilitate future MT research and development. Still Amharic text and document translation softwares and systems are very poor in translation quality, rebustnees and speed.
3 Statements of the Problem
Different researches have been done that translate English text to Amharic text which is based on statistical machine translator, phonetic transcription and rule based machine translation, NMT. The problems raised with those methods are: Concentrate on smaller units of the spoken message, and the subtle details of pronunciation, translating material that is not similar to content from the training corpora, can excel with material that the training corpora have defined, such as technical texts written in a simple style, it will struggle if it’s given text that contains slang, idioms or an overall casual style which results in poor accuracy. Another issue is that SMT systems need bilingual content and that can be tricky when it comes to finding content written in rarer languages, preprocessing and corpus creation is not only expensive and time-consuming. While RBMT Many rules can and must be added to improve quality, leading to a very complex system. NMT method has some limitations which are the system usually has to apply a vocabulary of acertain size to avoid the time-consuming training and decoding, thus it causes a serious out-of-vocabulary problem. Furthermore, the decoder lacks a mechanism to guarantee all the source words to be translated and usually favors short translations, resulting in?uent but inadequate translations.
The proposed study uses a hybrid neural machine translation technique to translate English text into Amharic text. Hybrid NMT system that takes the best feature from SMT and it depends more on the neural machine translation model. In order to solve the problems shown in SMT and NMT, the proposed paper integrate statistical machine translation features, such as a translation model, word reward feature, translation table, language model and n-gram language model, with the NMT model. The Amharic language is morphologically rich language, which requires its own language translation design and model in order to select the best approach directly related with Amharic morpheme (Fidel) behaviors and characteristics in order to solve which is commonly occur translation problems, translation quality and OOV that fits with the language behaviour.
This research paper answer the questions:
? How text translation is accomplished using a hybrid translation approach for Amharic text?
? How accurate is the translated text? Are there any typographical, spelling or grammar mistakes?
? Is hybrid neural machine translation the better model for Amharic text (Fidel)?
? Does the translation flow naturally in the target language or would a different choice of words be better?
? Which type of translation level is best for Amharic Fidel?
? Is the translation correct for the intended audience? Did the translator use the correct dialect and localized language?
? Does the model meet with Amharic language morphology (sine-kal?
? What are challenges in hybrid neural machine translation model for Amharic fidel?
? How to cluster SMT feature and NMT for Amharic text?
? Is Hybrid neural machine translation model fits with Amharic datasets?
4 Objectives of the Study
4.1 General Objective
The general objective of this study is to design the best approach and model to Translate English text to Amharic text using Hybrid Neural Machine Translation.
4.2 Specific Objectives
The specific objectives of hybrid neural machine translation for English-Amharic text are as follows:
? To computerize the concept of translation of English – Amharic text.
? To design the text translation model for English – Amharic text using a hybrid technique.
? To analyze hybrid machine translation with Amharic fidel.
? To indicate the best ways suitable for English – Amharic text translation.
? To develop better language translator with better accuracy and quality.
? To propose a general indication for English-Amharic text hybrid translation.
? To evaluate hybrid machine translation with previous machine translation.
? To analyze the properties of the hybrid neural machine translation.
? To show the best translation pair for Amharic Fidel.
? To designate the finest dataset algorithm in Amharic text translation.
5 Methodologies
Several methods will be used to accomplish the research goals. Data collection and preparation, Discussion with NLP and Machine translation Experts, Analyzing written documents, selecting the Approach, preparing dataset, training and testing the model, develop a prototype and evaluation of the Hybrid Neural Machine Translation are described below.
5.1 Data Collection and Preparation
A good sized of text can show a reasonable morphological behavior of a language. Collection of text data is, therefore, an important component in developing a hybrid Neural Machine Translation. For the Amharic language, there is public text corpus. Therefore, the corpus used in this research will be collected from web-corpora.net, Sketch Engine, Github and from the English-Amharic education materials. The model trained with more than 10000 prepared corpus phrases. Then the only small amount of words were remained and reserved for the purpose testing and analyzing.
5.2 Discussion with NLP and MT Experts
To have more insight on the research, a thorough discussion was made with Natural Language processing experts of the Amharic language, SMT and NMT experts. The discussion helped to gain the most important concept of how the language is translated of the word in Amharic text can be generated in machine translation. This will be used as primary input in addition to secondary information that has been reviewed in the course of the study.
5.3 Developing Amharic word expansion program
From randomly selected words of different categories, i.e. nouns, verbs, adverbs and adjectives, an Amharic corpus and dataset of limited vocabulary will be developed through morphological expansion by applying different rules.
5.4 Encoder–Decoder Translation Approach
Hybrid Neural machine translation model uses conducts end-to-end translation with a source language encoder and a with the end language decoder part making promising translation performance. Neural machine translation system comparatively is a recent approach to statistical machine translation based on neural networks concept. The big idea behind Neural machine translation is it uses two elements consist of an encoder-decoder for the translation purpose. The encoder part of the system which separates a common size of description from a variable-length input sentence, on the other side the decoder part of the system which produce a correct translation out from the representation of vectors.
5.5 System Prototyping Tool
To develop the proposed system model prototype, Python programming language will be used. The system uses a Python libraries used for scientific computing and technical computing. In addition, through python, the system can easily call open source Machine translation API function and to access translation components.
5.6 Evaluation Method
The evaluation method presented as the scores of the quality evaluation metrics uses Bilingual understudy which is commonly known as BLEU which is the most common evaluation metric in machine translation research which measures the similarity between proposed new translations model and reference translations for both phrase based statistical machine translation and hybrid neural machine translation engines in the system model. The research shows the model training time with in hours for the hybrid hybrid neural machine translation engines, each model’s confusion on the test set is also showen. The system model evaluation results in percent, the system training time in hours and side-by-side comparison set up and test, tested with the help of online quality evaluation means and for the test human evaluator’s related 200 sections. Additionally to this, the research paper prepare and release a small dataset of very accurate English-Amharic text translations of difficult sentences helpful for testing of the translation systems model.
5.7 Toolkits
The system model will be trained using TensorFlow and Tensor Processing Units machine translation toolkit which gives acceptable computational ability for the implementation of these hybrid neural machine translation. The system model which is formed by an encoder, that translate the given source sentence into a sequence of numerical vectors, and a decoder, that predicts the target sentence based on the encoded source sentence. Once the English-Amharic corpus and datasets are prepared then it passes through the training phase, tuning phase and finally testing set.