Using images as context for Statistical Machine Translation


EPSRC Vision and Language Network, pump priming


In this project we investigate whether information from images can be helpful as context for statistical machine translation (SMT). We target two well-known challenges in SMT, particularly when it is used to translate short texts: ambiguity (incorrect translation of words that have multiple senses) and out-of-vocabulary words (words left untranslated). In order to do so, we automatically built a dataset containing:
  • Images from Wikipedia
  • The image captions in English
  • The machine translations of the captions into Portuguese, Spanish, German or French
  • A human (reference) translation as found in Wikipedia for each English caption
  • A similar image retrieved from ImageNet using basic computer vision methods
  • Keywords from the WordNet synset associated with the retrieved image.
In order to understand whether exploiting each of these sources of information is worthwhile, we will collect human judgements on a sample of this dataset.


March-December 2012


Researcher Affiliation Country
Lucia Specia University of Sheffield UK
Teo de Campos University of Surrey UK
Iacer Calixto University of Wolverhampton UK


Coming soon...

Check some examples of original images (from Wikipedia) and similar images retrieved from ImageNet using a baseline method.

This is a short paper describing the process of building the dataset and the procedure to evaluate this dataset. This is the poster presented at the VL'12 Workshop in December 2012, which contains some initial results of the evaluation on a sample of the English-Portuguese dataset.