The University of Sheffield
Natural Language Processing Group

Publications

Alternative links for a list of my publications: Google Scholar and Lattes.

2018

  • Lucia Specia, Carolina Scarton and Gustavo Henrique Paetzold (2018): Quality Estimation for Machine Translation. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers. [LINK] [BIBTEX]
  • Mikel Forcada, Carolina Scarton, Lucia Specia, Barry Haddow and Alexandra Birch (2018): Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting. Accepted to appear in the proceedings of WMT 2018, Brussels, Belgium. [PDF] [BIBTEX]
  • Chiraag Lala, Pranava Swaroop Madhyastha, Carolina Scarton and Lucia Specia (2018): Sheffield's Submissions for WMT18 Multimodal Translation Tasks. Accepted to appear in the proceedings of WMT 2018, Brussels, Belgium. [PDF] [BIBTEX]
  • Julia Ive, Carolina Scarton, Frederic Blain and Lucia Specia (2018): Sheffield's systems for the WMT18 Quality Estimation shared task. Accepted to appear in the proceedings of WMT 2018, Brussels, Belgium. [PDF] [BIBTEX]
  • Carolina Scarton and Lucia Specia (2018): Learning Simplifications for Specific Target Audiences. In the Proceedings of ACL 2018, Melbourne, Australia, pp. 712-718. [PDF] [BIBTEX]
  • Carolina Scarton, Gustavo Henrique Paetzold and Lucia Specia (2018): Text Simplification from Professionally Produced Corpora. In the Proceedings of LREC 2018, Miyazaki, Japan, pp. 3504-3510. [PDF] [BIBTEX]
  • Carolina Scarton, Gustavo Henrique Paetzold and Lucia Specia (2018): SimPA: A Sentence-Level Simplification Corpus for the Public Administration Domain. In the Proceedings of LREC 2018, Miyazaki, Japan, pp.4333-4338. [PDF] [BIBTEX]

    2017

    • Fernando Alva Manchego, Joachim Bingel, Gustavo Henrique Paetzold, Carolina Scarton and Lucia Specia (2017): Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs. In the Proceedings of the 8th International Joint Conference on Natural Language Processing, Taipei, Taiwan, pp. 295-305. [PDF] [BIBTEX]
    • Carolina Scarton, Alessio Palmero Aprosio, Sara Tonelli, Tamara Martín Wanton and Lucia Specia (2017): MUSST: A Multilingual Syntactic Simplification Tool. In the Proceedings of the 8th International Joint Conference on Natural Language Processing: System Demonstrations, Taipei, Taiwan, pp. 25-28. [PDF] [BIBTEX]
    • Frédéric Blain, Carolina Scarton and Lucia Specia (2017): Bilexical Embeddings for Quality Estimation. In the Proceedings of the Second Conference on Machine Translation, Copenhagen, Denmark, pp. 545-550. [PDF] [BIBTEX]
    • Yvette Graham, Qingsong Ma, Timothy Baldwin, Qun Liu, Carla Parra and Carolina Scarton (2017): Improving Evaluation of Document-level Machine Translation Quality Estimation. In the Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 356-361. [PDF] [BIBTEX]
    • Carolina Scarton (2017): Document-Level Machine Translation Quality Estimation. PhD Thesis (University of Sheffield, UK). [PDF] [BIBTEX]

    2016

    • Carolina Scarton, Gustavo Henrique Paetzold and Lucia Specia (2016): Quality Estimation for Language Output Applications. In the Proceedings of the 26th International Conference on Computational Linguistics: Tutorial Abstracts, Osaka, Japan, pp. 14-17. [PDF] [BIBTEX]
    • Carolina Scarton, Daniel Beck, Kashif Shah, Karin Sim Smith and Lucia Specia (2016): Word embeddings and discourse information for Machine Translation Quality Estimation. In the Proceedings of the First Conference on Machine Translation, Berlin, Germany, pp. 831-837. [PDF] [BIBTEX]
    • Odrej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor and Marcos Zampieri (2016): Findings of the 2016 Conference on Machine Translation. In the Proceedings of the First Conference on Machine Translation, Berlin, Germany, pp. 131-198. [PDF] [BIBTEX]
    • Carolina Scarton and Lucia Specia (2016): A Reading Comprehension Corpus for Machine Translation Evaluation. In the Proceedings of the Tenth International Conference on Language Resources and Evaluation, Portorož, Slovenia, pp. 3652-3658. [LINK] [BIBTEX]
    • Liling Tan, Carolina Scarton, Lucia Specia and Josef van Genabith (2016): SAARSHEFF at SemEval-2016 Task 1: Semantic Textual Similarity with Machine Translation Evaluation Metrics and (eXtreme) Boosted Tree Ensembles. In the Proceedings of the Tenth International Workshop on Semantic Evaluation (SemEval 2016), San Diego, CA, pp. 640-645. [PDF] [BIBTEX]
    • Sandra Maria Aluísio, Andre Cunha and Carolina Scarton (2016): Evaluating Progression of Alzheimer’s Disease by Regression and Classification Methods in a Narrative Language Test in Portuguese. In the Proceedings of the International Conference on Computational Processing of the Portuguese Language, Tomar, Portugal, pp. 109-114. [LINK]

    2015

    • Carolina Scarton and Lucia Specia (2015): A quantitative analysis of discourse phenomena in machine translation. Discours - Revue de linguistique, psycholinguistique et informatique, number 16. [LINK] [BIBTEX]
    • Odrej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia and Marco Turchi (2015): Findings of the 2015 Workshop on Statistical Machine Translation. In the Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, pp. 1-46. [PDF] [BIBTEX]
    • Carolina Scarton, Liling Tan and Lucia Specia (2015): USHEF and USAAR-USHEF participation in the WMT15 QE shared task. In the Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, pp. 336-341. [PDF] [BIBTEX]
    • Lucia Specia, Gustavo Henrique Paetzold and Carolina Scarton (2015): Multi-level Translation Quality Prediction with QuEst++. In the Proceedings of ACL-IJCNLP 2015 System Demonstrations, Beijing, China, pp. 110-120. [PDF] [BIBTEX]
    • Carolina Scarton (2015): Discourse and Document-level Information for Evaluating Language Output Tasks. In the Proceedings of NAACL-HLT 2015 Student Research Workshop (SRW), Denver, CO, pp. 118-125. [PDF] [BIBTEX]
    • Liling Tan, Carolina Scarton, Lucia Specia, Josef van Genabith (2015): USAAR-SHEFFIELD: Semantic Textual Similarity with Deep Regression and Machine Translation Evaluation Metrics. In the Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, CO, pp. 85-89. [PDF] [BIBTEX]
    • Carolina Scarton, Marcos Zampieri, Mihaela Vela, Josef van Genabith and Lucia Specia (2015): Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation. In the Proceedings of the 18th Annual Conference of the European Association for Machine Translation (EAMT 2015), Antalya, Turkey, pp. 121-128. [PDF] [BIBTEX]

    2014

    • Carolina Scarton, Magali Sanches Duran and Sandra Maria Aluísio (2014): Using Cross-linguistic Knowledge to Build VerbNet-style Lexicons: Results for a (Brazilian) Portuguese VerbNet. In the Proceedings of the 2014 International Conference on Computational Processing of Portuguese, São Carlos-SP, Brazil, pp. 149-160. [LINK] [BIBTEX]
    • Carolina Scarton and Lucia Specia (2014b): Exploring Consensus in Machine Translation for Quality Estimation. In the Proceedings of the Ninth Workshop on Statistical Machine Translation (WMT 2014) - in conjunction with ACL 2014, Baltimore-MD, pp. 342-347. [PDF] [BIBTEX]
    • Carolina Scarton and Lucia Specia (2014a): Document-level translation quality estimation: exploring discourse and pseudo-references. In the Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT 2014), Dubrovnik, Croatia, pp. 101-108. [PDF] [BIBTEX]
    • Carolina Scarton, Lin Sun, Karin Kipper-Schuler, Magali Sanches Duran, Martha Palmer and Anna Korhonen (2014): Verb Clustering for Brazilian Portuguese. In the Proceedings of 15th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2014), Katmandu, Nepal, pp. 25-39. [LINK] [BIBTEX]
    • Cíntia M. Toledo, Andre Cunha, Carolina Scarton, Sandra Aluísio (2014): Automatic classification of written descriptions by healthy adults: an overview of the application of natural language processing and machine learning techniques to clinical discourse analysis. Dement. Neuropsychol. 2014;8(3):227-235. [LINK] [BIBTEX]
    • Leonardo Zilio, Adriano Zanette and Carolina Scarton (2014): Automatic Extraction of Subcategorization Frames from Portuguese Corpora. In Aluisio, S. M. and Tagnin. S. E. O. (eds.) New Languages Technologies and Linguistic Research: a Two-Way Road. Cambridge Scholars Publishing, pp. 78-96. [LINK] [BIBTEX]

    2013

    • Carolina Scarton (2013): VerbNet. Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil. Master's Dissertation (University of São Paulo, Brazil). [PDF] [BIBTEX]
    • Magali Sanches Duran, Carolina Scarton, Sandra Maria Aluísio and Carlos Ramisch (2013): Identifying Pronominal Verbs: Towards Automatic Disambiguation of the Clitic 'se' in Portuguese. In the Proceedings of 9th Workshop on Multiword Expressions (MWE 2013), in conjunction with NAACL-HLT-2013, Atlanta, Georgia, USA. [PDF] [BIBTEX]
    • André Cunha, Cíntia Toledo, Carolina Scarton, Letícia Mansur and Sandra Maria Aluísio (2013): Classificação Automática de Discurso Descritivo Escrito de Adultos Sadios: Referência para a Avaliação da Linguagem de Lesados Cerebrais. In the Proceedings of X Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2013), Fortaleza-CE, Brazil. [PDF] [BIBTEX]

    2012

    • Carolina Scarton and Sandra Maria Aluísio (2012): Towards a cross-linguistic VerbNet-style lexicon to Brazilian Portuguese. In Proceedings of LREC 2012 Workshop on Creating Cross-language Resources for Disconnected Languages and Styles (CREDISLAS 2012), Istambul, Turkey. [PDF] [BIBTEX]
    • Leonardo Zilio, Adriano Zanette and Carolina Scarton (2012): Extração Automática de Estruturas de Subcategorização a partir de Corpora em Português. In the Proceedings of XI Encontro de Linguística de Corpus (ELC 2012), São Carlos - SP, Brazil. [PDF] [BIBTEX]
    • Adriano Zanette, Carolina Scarton and Leonardo Zilio (2012): Automatic extraction of subcategorization frames from corpora: an approach to Portuguese. In International Conference on Computational Processing of Portuguese (PROPOR 2012): Demonstration session, Coimbra, Portugal. [PDF] [BIBTEX]

    2011

    • Carolina Scarton (2011): VerbNet.Br: construção semiautomática de um léxico computacional de verbos para o português do Brasil. In the Proceedins of 8th Brazilian Symposium in Information and Human Language Technology (STIL 2011), Cuiabá-MT, Brazil. [PDF] [BIBTEX]
    • Bianca Pasqualini, Carolina Scarton and Maria José B. Finatto (2011): Comparando Avaliações de Inteligibilidade Textual entre Originais e Traduções de Textos Literários. In the Proceedings of 8th Brazilian Symposium in Information and Human Language Technology (STIL 2011), Cuiabá-MT, Brazil. [PDF] [BIBTEX]
    • Maria José B. Finatto, Carolina Scarton, Amanda Rocha and Sandra Maria Aluísio (2011): Características do jornalismo popular: avaliação da inteligibilidade e auxílio à descrição do gênero. In the Proceedings of 8th Brazilian Symposium in Information and Human Language Technology (STIL 2011), Cuiabá-MT, Brazil. [PDF] [BIBTEX]
    • Carolina Scarton and Sandra Maria Aluísio (2011): O uso do MERLOT por Alunos de Teoria da Computação para a Criação de Materiais de Ensino-Aprendizagem. In the Proceedings of XIX Workshop sobre Educação em Computação (WEI 2011), Natal-RN, Brazil. [PDF] [BIBTEX]
    • Fernando A. M. Muniz, Willian M. Watanabe, Carolina Scarton and Sandra Maria Aluísio (2011): Extração de Termos de Manuais Técnicos de Produtos Tecnológicos: uma Aplicação em Sistemas de Adaptação Textual. In the Proceedings of XXXVIII Seminário Integrado de Software e Hardware (SEMISH 2011) Natal-RN, Brazil. [PDF] [BIBTEX]
    • Carolina Scarton and Sandra Maria Aluísio (2011): VerbNet.Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil. In the Proceedings of X Encontro de Linguística de Corpus (ELC 2011), on-going research, Belo Horizonte-MG, Brazil [PDF] [BIBTEX]
    • Carolina Scarton (2011): VerbNet-Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil. In the Proceedings of I Congresso Internacional de Estudos do Léxico (ICIEL 2011), Comunicação Coordenada: Rove Chishman, Magali Sanches Duran, Carolina Scarton and Oto Araújo Vale - O verbo no Computador: diferentes abordagens da descrição lexical para o processamento de língua natural, Salvador-BA, Brazil. [LINK] [BIBTEX]

    2010

    • Carolina Scarton and Sandra Maria Aluísio (2010): Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. Linguamática, v. 2, p. 45-62. [PDF] [BIBTEX]
    • Sandra Maria Aluísio, Lucia Specia, Caroline Gasperin and Carolina Scarton (2010): Readability Assessment for Text Simplification. In the Proceedings of 5th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2010), Los Angeles, CA, USA. [PDF] [BIBTEX]
    • Carolina Scarton, Caroline Gasperin and Sandra Maria Aluísio (2010): Revisiting the Readability Assessment of Texts in Portuguese. In the Proceedings of 12th Ibero-American Conference on Artificial Intelligence (Iberamia 2010), Bahia Blanca, Argentina, pp. 306-315. [PDF] [BIBTEX]
    • Carolina Scarton, Matheus Oliveira, Arnaldo Candido Junior, Caroline Gasperin and Sandra Maria Aluísio (2010): SIMPLIFICA: an authoring system targeting simplified texts in Brazilian Portuguese. In International Conference on Computational Processing of Portuguese (PROPOR 2010): Demonstration session, Porto Alegre-RS, Brazil. [PDF] [BIBTEX]
    • Carolina Scarton and Sandra Maria Aluísio (2010): Coh-Metrix-Port: a readability assessment tool for texts in Brazilian Portuguese. In International Conference on Computational Processing of Portuguese (PROPOR 2010): Demonstration session, Porto Alegre-RS, Brazil. [PDF] [BIBTEX]
    • Carolina Scarton, Matheus Oliveira, Arnaldo Candido Junior, Caroline Gasperin and Sandra Maria Aluísio (2010): SIMPLIFICA: a tool for authoring simplified texts in Brazilian Portuguese guided by readability assessments. In NAACL 2010: demonstration session, Los Angeles, CA, USA. [PDF] [BIBTEX]

    2009

    • Carolina Scarton, Daniel M. Almeida and Sandra Maria Aluísio (2009): Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. In Proceedings of 7th Brazilian Symposium in Information and Human Language Technology (STIL 2009), São Carlos-SP, Brazil. [PDF] [BIBTEX]