Reading Lists from Previous Years
Some Key Conference Deadlines:
ACM SIGIR 2007 (January 28 deadline - Amsterdam)
ACM IEEE JCDL 2007 (January 29 deadline - Vancouver)
Resources:
TREC web site
BioCreAtIve web site>
Downloading SMART from Cornell University
Managing Gigabytes (MG) retrieval system
Lucene
Lucene in Action. by Erik Hatcher and Otis Gospodnetic. Manning Publications Co. 2004.
Introduction to Information Retrieval. C.D. Manning, P. Raghavan, H. Schutze. Cambridge UP, 2007. Draft.
Information Retrieval C. J. van RIJSBERGEN. London: Butterworths, 1979.
Information Retrieval Interaction. Peter Ingwersen, Taylor Graham, 1992.
Information Retrieval: A Survey. Ed Greengrass. 2000.
CMU-Cambridge Statistical Language Modeling toolkit
Student Projects
Goal: This seminar course will cover current research in text retrieval and text mining. After reading some foundational papers and book chapters we will study papers from journals (such as ACM TOIS, TOIT, Bioinformatics) and conference proceedings (such as ACM SIGIR, WWW, CIKM). Examples of problems include expert detection, web retrieval and web mining, ranking strategies, ambiguity resolution, knowledge discovery, web phenomenon including social networks, information extraction and text classification. Interested students (from beginning to advanced) are invited to participate in the reading group. It is run as a seminar with individuals taking turns to present an overview of the selected paper and lead the discussion. Upon completion of this course students will have gained broad exposure to a variety of current text based research problems and applications. The semester long project will allow students to gain significant understanding of a specific problem.
Special Focus: We will start with fairly introductory concepts and then move quickly towards papers from different proceedings and journals. A big emphasis will be on problems emphasized in TREC (Text REtrieval Conference) which is an international forum for testing algorithms and models. Students will have to complete a project for this seminar course. Students are encouraged to select projects from the TREC framework.
Evaluation: Participation (15%), Project (70%), Project presentation (15%)
Introduction to seminar.
The Text REtrieval Conference Chapter 1 from Experiment and Evaluation in Information Retrieval.
Edited by Ellen M. Voorhees and Donna K. Harman. MIT Press.
Chapter 2: What is information retrieval (Greengrass book).
Chapter 3: Approaches to IR (Greengrass book).
Chapter 4: Classical Boolean Approach to IR (Greengrass book).
Chapter 6: Vector Space Approach (Greengrass book). (you may stop after 6.3).
Chapter 6: Scoring and Term Weighting (Manning, Raghavan and Schutze book).
Chapter 7: Vector Space Retrieval (Manning, Raghavan and Schutze book).
Exploring the Similarity Space. Zobel and Moffat. ACM SIGIR Forum, 1998. (Do a web search or get from the ACM Digital Library).
Chapter 6.4: Computation of Similarity between Document & Query (Greengrass book).
Chapter 6.5: Latent Semantic Indexing ... (Greengrass book).
Chapter 7 (upto & including 7.4.1): Probabilistic models ... (Greengrass book).
Lucene - demos.
TREC - Enterprise Track: website. Read the 2006 overview paper: ENT.OVERVIEW.pdf
TREC - Spam Track: website. Read guidelines at that site and read the 2006 overview paper: SPAM.OVERVIEW.pdf
TREC - Legal Track: website. Read the guidelines
and the 2006 overview paper: LEGAL.OVERVIEW.pdf
TREC - Blog Track. Read the 2006 overview paper: BLOG.OVERVIEW.pdf
TREC - Terabyte Track. Read the 2006 overview paper: TERA.OVERVIEW.pdf
Common Evaluation Measures. NIST document (2005)
Each individual pick a paper from their favourite TREC track.
Focus on methodology and results.
Selection of project is due - submit a brief 1 page writeup
Tools for Projects. Presentation by Aditya Sehgal.
TREC - Question answering track. Read the 2006 overview paper: QA.OVERVIEW.pdf
The Open University at TREC 2006 Enterprise Track Expert Search Task. To be presented by Jeremy Robinson
SVM-Based Spam Filter with Active and Online
Learning. To be presented by Nengda Jin.
RelevanceBased
Language Models. Victor Lavrenko and W. Bruce Croft.
Slides from Brian Almquist on TREC Legal Track.
Language Models for Expert Finding -- UIUC TREC 2006 Enterprise Track Experiments,
H. Fang, L. Zhou, C.-X. Zhai, University of Illinois at Urbana-Champaign
Information Retrieval Using Language Models Kieran McDonald thesis (read chapter 2. Information Retrieval Using Language Models (available from his web site)).
TREC 2006 Genomics Track Overview W. Hersh et al.
TREC 2007 Genomics Track protocol
Thumbs up? Sentiment Classification using Machine Learning
Techniques. by Pang, Lee and Vaithyanathan. Presented by J.T.
Concept recognition and the TREC Genomics tasks. (Get from TREC 2006 site). Presented by Si-Chi
An Adaptive, SemiStructured
Language Model Approach
to Spam Filtering on a New Corpus by Ben Medlock. Paper presented by Aravind
Using Social Network Analysis to Automatically Discover Competitor Relationships from Business News. by Ma, Sheng and Pant. (Paper emailed to all). Presented by Jeremy Robinson.
Answering relationship queries on the Web. Luo et al. WWW 2007. Get paper from here. Presented by Ritesh Nadhani
Bibliometric impact measures leveraging topic analysis. JCDL 2006. Get paper from here. Presented by Junfeng Zheng.
Coauthorship networks and patterns of scientific collaboration M. E. J. Newman, PNAS , 2004. Presented by Chris Timko.