Raymond W. M. Ng

 

RayNg

BEng, MA, PhD (Chinese University Hong Kong)

 

Research Associate in

Speech and Hearing (SpandH) Group,

Department of Computer Science,

University of Sheffield.

 

(Tel): 0114-222-1918

(email) wm dot ng at sheffield dot ac dot uk

 

News: [Jun 27, 2016] Our Speaker Odyssey paper describing the Sheffield LRE system is now online

[Jun 10, 2016] Our papers on language recognition, ASR domain adaptation and webASR 2 are accepted to Interspeech 2016.(Refer to the conference)

 

I am currently a research associate with the University of Sheffield. My current research focus is spoken language translation, and I am also involved in research related to speaker and language recognition.

 

Research:

-       Language identification with prosodic features

Automatic spoken language identification (LID) is the process of automatically determining the language of a spoken document. It has many applications in multi-lingual multi-media information processing. The challenges of LID lie on the fact that finding discriminative features of a language requires information from multiple sources. Different features, from the acoustic properties of speech signals, to various acoustically derived linguistic representations, were explored in previous studies. Many state-of-the-art LID systems exploit the acoustic-phonetic properties of speech and/or its phonotactics, which are rules governing the sequence of allowable phones and phonemes. These strategies mainly focus on short-time spectral features. In our research, we introduce prosodic features, which are realised mainly in the form of pitch, intensity and rhythm in fluent speech. The proposed method uses the intensity profile in the sonorant band to tokenise speech into pseudo-syllable structures, from which prosodic features are extracted for the use of language recognition. There is a large inventory of prosodic feature candidates, these features are assessed with mutual-information based analysis methods and applied in NIST language recognition (LRE) tasks. In these tasks, we have reported 10%-20% relative improvements a prosodic LID system can bring to the conventional phonotactic approach to LID.

-       Spoken language translation

Spoken language translation (SLT) combines automatic speech recognition (ASR) and machine translation (MT). State-of-the-art SLT systems normally require careful tuning of the parameters of both the ASR and the MT components. To achieve reasonable performance the ASR components are normally required to demonstrate robust performance (with low word error rates, WER), upon which a pipeline approach is adopted to link the ASR and the MT components. In more practical scenarios, an SLT system has to deal with more varied inputs. In a situation with mismatched input or domain, the automatic transcript may have much higher WERs. At this point little is known on what types of errors in high WER scenarios cause specific degradation in MT performance. This research focuses on improving the SLT system performance by means of system integration. By modelling the information flow between the ASR and the MT components, the ASR output can be filtered and/or adapted to alleviate the conditions of model mismatch. MT system can also be tuned to the filtered, coupling output.

 

Publications:

-       Raymond W. M. Ng, Bhusan Chettri and Thomas Hain, “Combining weak tokenisers for phonotactic language recognition in a resource-constrained setting”, (to appear) in Proc. Interspeech, 2016.

-       Thomas Hain, Jeremy Christian, Oscar Saz, Salil Denna, Madina Hasan, Raymond W. M. Ng, Rosanna Milner, Mortaza Doulaty and Yulan Liu, “webASR 2 - Improved cloud based speech technology”, (to appear) in Proc. Interspeech, 2016. (webASR)

-       Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng and Thomas Hain, “Automatic genre and show identification of broadcast media”, (to appear) in Proc. Interspeech, 2016.

-       Raymond W. M. Ng, Mauro Nicolao, Oscar Saz, Madina Hasan, Bhusan Chettri, Mortaza Doulaty, Tan Lee and Thomas Hain, “The Sheffield language recognition system in NIST LRE 2015”, in Proc. Speaker Odyssey, 2016.(link)

-       Raymond W. M. Ng, Mauro Nicolao, Oscar Saz, Madina Hasan, Bhusan Chettri, Mortaza Doulaty, Tan Lee and Thomas Hain, “Sheffield LRE 2015 System Description”, in Proc. NIST LRE 2015, 2015.

-       Raymond W. M. Ng, Kashif Shah, Lucia Specia and Thomas Hain, “Groupwise learning for ASR k-best list reranking in spoken langauge translation”, in Proc. ICASSP, 2016. (link)

-       Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng and Thomas Hain, “Latent Dirichlet Allocation Based organisation of braodcast media archives for deep neural network adaptation”, in Proc. ASRU, 2015.

-       Rosanna Milner, Oscar Saz, Salil Deena, Mortaza Doulaty, Raymond W. M. Ng and Thomas Hain, “The 2015 Sheffield system for longitudinal diarisation of broadcast media”, in Proc. ASRU, 2015.

-       Oscar Saz, Mortaza Doulaty, Salil Deena, Rosanna Milner, Raymond W. M. Ng, Madina Hasan, Yulan Liu and Thomas Hain, “The 2015 Sheffield System for Transcription of Multi-Genre Broadast Media”, in Proc. ASRU, 2015.

-       Kashif Shah, Raymond W. M. Ng, Fethi Bougares and Lucia Specia, “Investigating continuous space language models for machine translation quality estimation”, in Proc. EMNLP, 2015.

-       Raymond W. M. Ng, Kashif Shah, Lucia Specia and Thomas Hain, “A study on the stability and effectiveness of features in quality estimation for spoken langauge translation”, in Proc. Interspeech, 2015. (link)

-       Ghada AlHarbi, Raymond W. M. Ng and Thomas Hain, “Annotating Meta-discourse in Academic Lectures from Different Disciplines”, in Proc. SLaTE, 2015.

-       Raymond W. M. Ng, Kashif Shah, Wilker Aziz, Lucia Specia and Thomas Hain, “Quality estimation for ASR K-best list rescoring in spoken language translation”, in Proc. ICASSP, 2015. (link)

-       Raymond W. M. Ng, Mortaza Doulaty, Rama Doddipatla, Wilker Aziz, Kashif Shah, Oscar Saz, Madina Hasan, Ghada AlHarbi, Lucia Specia and Thomas Hain, “The USFD SLT system for IWSLT 2014”, in Proc. IWSLT, 2014. (link)

-       Raymond W. M. Ng, Cheung-Chi Leung, Tan Lee, Bin Ma and Haizhou Li, "Spoken language identification with prosodic features", IEEE Trans. Audio, Speech, Lang. Prcs., 2013.

-       Raymond W. M. Ng, Thomas Hain and Trevor Cohn, "Adaptation of lecture speech recognition system with machine translation output", in Proc. ICASSP, 2013. (link)

-       Raymond W. M. Ng, Thomas Hain and Keikichi Hirose, "An alignment matching method to explore pseudosyllable properties across different corpora", in Proc. Interspeech, pp. 863-866, 2012.

-       Raymond W. M. Ng and Keikichi Hirose, "Automatic segmentation of English words using phonotactic and syllable information", in Proc. Speech Prosody, 2012.

-       Raymond W. M. Ng and Keikichi Hirose, "Syllable: A self-contained unit to model pronunciation variation", in Proc. ICASSP, pp. 4457-4460, 2012.

-       Raymond W. M. Ng, Cheung-Chi Leung, Tan Lee, Bin Ma and Haizhou Li, "Score fusion and calibration in multiple language detectors with large performance variation", in Proc. ICASSP, pp. 4404-4407, 2011.

-       Raymond W. M. Ng, Cheung-Chi Leung, Ville Hautamäki, Tan Lee, Bin Ma and Haizhou Li, "Towards long-range prosodic attribute modeling for language recognition", in Proc. Interspeech, pp. 1792-1795, 2010.

-       Raymond W. M. Ng, Cheung-Chi Leung, Tan Lee, Bin Ma and Haizhou Li, "Detection target dependent score calibration for language recognition", in Proc. Speaker Odyssey, pp. 91-96, 2010.

-       Raymond W. M. Ng, Cheung-Chi Leung, Tan Lee, Bin Ma and Haizhou Li, "An entropy-based approach for comparing prosodic properties in tonal and pitch accent languages", in Proc. Speech Prosody, May, 2010.

-       Raymond W. M. Ng, Cheung-Chi Leung, Tan Lee, Bin Ma and Haizhou Li, "Prosodic attribute model for spoken language identification", in Proc. ICASSP, pp. 5022-5025, 2010.

-       Raymond W. M. Ng, Tan Lee, Cheung-Chi Leung, Bin Ma and Haizhou Li, "Analysis and selection of prosodic features for Asian language recognition", International Journal of Asian Language Processing, Vol. 19, no. 4, pp. 139-152, 2009.

-       Raymond W. M. Ng, Tan Lee, Cheung-Chi Leung, Bin Ma and Haizhou Li, "Analysis and selection of prosodic features for language identification", in Proc. IALP, pp. 123-128, 2009.

-       Raymond W. M. Ng and Tan Lee, "Entropy-based analysis of the prosodic features of Chinese dialects", in Proc. International Symposium on Chinese Spoken Language Processing, pp. 65-68, December, 2008.

-       Raymond W. M. Ng, Tan Lee and Wentao Gu, "Towards automatic parameter extraction of command-response model for Cantonese", in Proc. International Conference on Spoken Language Processing, pp. 2358-2361, September, 2006.

 

 

Experience / Awards:

-       Invitation Program for foreign-based researchers, 2011, NICT, Japan

-       Research internship, 2009, Institute for Infocomm Research, Singapore

 

Work:

Here are some of the works I have done related to my research.

-        Minimum Error Rate Tuning (MERT) SGE parallel implementation (here)

-        Pseudosyllable extraction

 

Peer Reviewer:
(Conference)
- Speaker Odyssey 2016
- Interspeech 2016
- NACCL HLT 2016
- Oriental-COCOSDA 2015
- Interspeech 2015
- ISCSLP 2014
- Interspeech 2014
- Oriental-COCOSDA 2012
- European Signal Processing COnference 2012
- Oriental-COCOSDA 2011
(Journal)
- EURASIP Journal on Aduio, Speech and Music Processing 2015
- IEEE/ACM Transactions on Audio, Speech, and Language Processing 2015
- Journal of Signal Processing Systems 2014
- Computer Speech and Language 2014
- Journal of Multimedia 2013
- ETRI Journal 2011

 

Talk / Presentations:

-        Quality estimation for ASR K-best list rescoring in spoken language translation (ICASSP 2015 presentation) , 22/Apr/2015, Plaza P1/P2, The Brisbane Convention & Exhibition Centre, Brisbane, Australia.

-        Language Recognition, 14/Oct/2014, John Carr Library meeting room, Mappin Building, the University of Sheffield, Sheffield, United Kingdom.

-        SGE usage, 15/Oct/2014, John Carr Library meeting room, Mappin Building, the University of Sheffield, Sheffield, United Kingdom.

-        Language Recognition, 14/Oct/2014, John Carr Library meeting room, Mappin Building, the University of Sheffield, Sheffield, United Kingdom.

-        Speech Technology and Translation Universal Survey – System building (w/ M. Nicolao, T. Hain), 5/Aug/2014, United Kingdom.

-        Building translation systems, 24/Sep/2013, John Carr Library meeting room, Mappin Building, the University of Sheffield, Sheffield, United Kingdom.

-        Introduction to spoken language translation, 10/Sep/2013, SHIAE, the Chinese University of Hong Kong, Hong Kong.

-        Speech Technology and Translation Universal Survey (w/ T. Hain), 13/Jun/2013, United Kingdom.