Multilingual Natural Language Processing
Rada Mihalcea, University of North Texas, USA
With rapidly growing online resources, such as Wikipedia, Twitter, or Facebook, there is an increasing number of languages that have a Web presence, and correspondingly there is a growing need for effective solutions for multilingual natural language processing. In this talk, I will explore the hypothesis that a multilingual representation can enrich the feature space for natural language processing tasks, and lead to significant improvements over traditional solutions that rely exclusively on a monolingual representation. Specifically, I will describe experiments performed on three different tasks: word sense disambiguation, subjectivity analysis, and text semantic similarity, and show how the use of a multilingual representation can leverage additional information from the languages in the multilingual space, and thus improve over the use of only one language at a time.
Comparative Evaluation Redux, or: How to Stop Worrying and Learn to Love the Variance
Evangelos Kanoulas, Google Zurich, Switzerland
Information retrieval effectiveness evaluation typically takes one of three forms: batch experiments based on static test collections, lab studies measuring actual users interacting with a system, or online experiments tracking user's interactions with a live system. Test collection experiments are sometimes viewed as introducing too many simplifying assumptions to accurately predict the usefulness of a system to its users. As a result, there is great interest in creating test collections that better model the variability encountered in real-life search scenarios. This includes experimenting over a variety of queries, corpora or even users and their interactions with the search results. In this talk I will discuss how to control different aspects of batch experimentation, how to model the variance control variables introduce to measurements of effectiveness, and how to extend our statistical significance test arsenal to allow comparing retrieval algorithms.
Rada Mihalcea is an Associate Professor in the Department of Computer
Science and Engineering at the University of North Texas. Her research
interests are in computational linguistics, with a focus on lexical
semantics, graph-based algorithms for natural language processing, and
multilingual natural language processing. She serves or has served on the
editorial boards of the Journals of Computational Linguistics, Language
Resources and Evaluations, Natural Language Engineering, Research in
Language in Computation, IEEE Transations on Affective Computing, and
Transactions of the Association for Computational Linguistics. She was
a program co-chair for the Conference of the Association for Computational
Linguistics (2011), and the Conference on Empirical Methods in Natural
Language Processing (2009). She is the recipient of a National Science
Foundation CAREER award (2008) and a Presidential Early Career Award for
Scientists and Engineers (2009).
Evangelos Kanoulas is a research scientist at Google Zurich. Prior to that he was a research associate at the Information School of the University of Sheffield. He obtained his PhD from Northeastern University in 2009. His research interests lie in Information Retrieval and Natural Language Processing. He has published research paper extensively and organized tutorials/workshops at major information retrieval venues such as SIGIR, CIKM and ECIR. He is the recipient of a Marie Curie International Incoming Fellowship (2010). Further, Evangelos was actively involved in coordinating the TREC Million Query Track (2007-2009), while he is one of the coordinators for the TREC Session Track (2010-2013).