Using images as context for Statistical Machine Translation
EPSRC Vision and Language Network, pump priming
SummaryIn this project we investigate whether information from images can be helpful as context for statistical machine translation (SMT). We target two well-known challenges in SMT, particularly when it is used to translate short texts: ambiguity (incorrect translation of words that have multiple senses) and out-of-vocabulary words (words left untranslated). In order to do so, we automatically built a dataset containing:
Check some examples of original images (from Wikipedia) and similar images retrieved from ImageNet using a baseline method.
This is a short paper describing the process of building the dataset and the procedure to evaluate this dataset. This is the poster presented at the VL'12 Workshop in December 2012, which contains some initial results of the evaluation on a sample of the English-Portuguese dataset.