Hands-on QuEst++
QuEst++ is an open source software is aimed at Quality Estimation (QE) for machine translation. It was developed by Professor Lucia Specia's team at the University of Sheffield and includes contributions from a number of researchers.
It has two main modules: a Java module to extract a number of word, sentence and document-level features and a Python module that interacts with the scikit-learn toolkit for machine learning. Is has also a few python and shell scripts for small things here and there.
See Hands-on material for details.
** Hands-on resource for including a new feature: List of simple words.
Pre-tutorial Instructions
- GitHub: - to clone QuEst++ from repository:
git clone https://github.com/ghpaetzold/questplusplus.git
- System requirements for QuEst++:
** JAVA **
Java 8 (JDK 1.8) - it should work with both OpenJDK and Oracle versions (java-8-oracle recommended)
-- NetBeans 8.1 OR
-- Apache Ant (>= 1.9.3)
** Notes about Ubuntu: On Ubuntu, it's easier to install the Oracle version:
sudo apt-get install oracle-java8-installer
(Check here if you don't find that version)
NetBeans has issues to build on Linux. Get Apache Ant instead to build through command line:
sudo apt-get install ant
** PYTHON **
Python 2.7.6 (or above - only 2.7 stable distributions)
-- NumPy and SciPy (NumPy >=1.6.1 and SciPy >=0.9)
-- scikit-learn (version 0.15.2)
-- PyYAML
- Feature extraction requirements:
Sentence-level:
- Perl 5 (or above)
- SRILM
** Note about Windows: Some tools (e.g. SRILM) might require Cygwin to run on Windows.
** Other details can be found at the QuEst++ GitHub page.
Versions of QuEst++
License: for our Java code is BSD and for our Python code is Apache License 2.0. For pre-existing code and resources, e.g., scikit-learn, SRILM, GIZA++, Stanford and Berkeley parsers, please check their website.
Check the current baseline, black-box, and glass-box lists of features QuEst++ can extract at sentence level.