QuEst++

Hands-on QuEst++

QuEst++ is an open source software is aimed at Quality Estimation (QE) for machine translation. It was developed by Professor Lucia Specia's team at the University of Sheffield and includes contributions from a number of researchers.

It has two main modules: a Java module to extract a number of word, sentence and document-level features and a Python module that interacts with the scikit-learn toolkit for machine learning. Is has also a few python and shell scripts for small things here and there.

See Hands-on material for details.

** Hands-on resource for including a new feature: List of simple words.

Pre-tutorial Instructions

GitHub: - to clone QuEst++ from repository:

git clone https://github.com/ghpaetzold/questplusplus.git
System requirements for QuEst++:
** JAVA **
Java 8 (JDK 1.8) - it should work with both OpenJDK and Oracle versions (java-8-oracle recommended)
-- NetBeans 8.1 OR
-- Apache Ant (>= 1.9.3)

** Notes about Ubuntu: On Ubuntu, it's easier to install the Oracle version:

sudo apt-get install oracle-java8-installer

(Check here if you don't find that version)

NetBeans has issues to build on Linux. Get Apache Ant instead to build through command line:

sudo apt-get install ant

** PYTHON **
Python 2.7.6 (or above - only 2.7 stable distributions)
-- NumPy and SciPy (NumPy >=1.6.1 and SciPy >=0.9)
-- scikit-learn (version 0.15.2)
-- PyYAML
Feature extraction requirements:
Sentence-level:
- Perl 5 (or above)
- SRILM

** Note about Windows: Some tools (e.g. SRILM) might require Cygwin to run on Windows.

** Other details can be found at the QuEst++ GitHub page.

Versions of QuEst++

Get QuEst++ from our GitHub repository (source code and basic tools - recommended for developers).

Get vanilla version of QuEst++ (complete JAR file with all libraries included of the stable version of the code - recommended for users).

License: for our Java code is BSD and for our Python code is Apache License 2.0. For pre-existing code and resources, e.g., scikit-learn, SRILM, GIZA++, Stanford and Berkeley parsers, please check their website.

Check the current baseline, black-box, and glass-box lists of features QuEst++ can extract at sentence level.

Citing QuEst++

Lucia Specia, Gustavo Henrique Paetzold and Carolina Scarton (2015): Multi-level Translation Quality Prediction with QuEst++. In the Proceedings of ACL-IJCNLP 2015 System Demonstrations, Beijing, China, pp. 115-120. [PDF][BIBTEX]