Automatic translation between human languages has become an invaluable technology for enabling communication and access to textual data on the internet. High profile companies, e.g., Google, have demonstrated its utility by providing high quality translation services between many different languages. This course will cover the technologies behind the modern statistical approach to machine translation, as used in Google's system. Statistical machine translation takes a data-driven, machine learning view of the problem, seeking to learn how to translation purely from data and with minimal human input. The course will cover the predominant approaches to machine translation -- word-based, phrase-based and grammar-based translation -- and their accompanying learning and decoding algorithms.
This course will run for 5 days, with lectures from 2-5 or thereabouts.