Friday, September 3, 2010

Google Translator is using “Statistical Translation Machine”

Before i start the article , i want to send a message to a group of idiots that said “Google Is using Rule Based” , the amazing thing is they don’t know the difference between them , so if you still don’t know the difference , i will show you the difference in this article so you can be in the safe side not the idiot side.

Rule Based Approach:
you will spend a lot of years building this engine and you wont get  good results in the end.

summary :

at this approach you have a dictionary that contains the two languages that you will translate from and to the other. then you have an engine that contains your language rules and their exceptions. when you translate you keep in mind the subjects , adverbs , verbs , ….etc. in both languages .

Statistical Translation Machine:
All what you need is data !


it simulates the way humans learn a new language , when you learn a new language you wont necessarily need to know the rules of the language , you will need to know the sentence in Arabic and its English equivalent.
first you need data , the data in this approach is any piece of text and its translation , then the algorithm tries to find a pattern in the data , based on the frequency of the text.
here is a reference

