6K278 Web Mining
Spring 2008
2:30P - 3:45P TTh 2058 LIB

Course Schedule



1. Web Crawlers

Pant, G., Srinivasan, P., Menczer, F. (2004). Crawling the Web. Web Dynamics: Adapting to Change in Content, Size, Topology and Use, edited by M. Levene and A. Poulovassilis: 153-178.

Baeza-Yates, R., Castillo, C., Marin, M. and Rodriguez, A. (2005). Crawling a Country: Better Strategies than Breadth-First for Web Page Ordering. In Proceedings of the Industrial and Practical Experience track of the 14th WWW conference, 864-872, Chiba, Japan. ACM Press.  (citeseer)

(Historical interest) Pinkerton, Brian. Finding What People Want: Experiences with the WebCrawler

2. A Few Crawler-Based Applications:

Pant, G., Tsioutsiouliklis, K., Johnson, J., Giles, C.L. Panorama: Extending Digital Libraries with Topical Crawlers. Proc. ACM/IEEE Joint Conference on Digital Libraries (JCDL 2004).

Srinivasan P., Mitchell J., Boderreider O., Pant G. and Menczer F. Web Crawling agents for Retrieving Biomedical Information. Proceedings of NETTAB 2002 Workshop on Agents in Bioinformatics. Bologna, Italy, July 2002.

3. PHP/MySQL/Perl

4. Search Engines

Brin, S. and Page, L. Anatomy of a Large-Scale Hypertextual Search Engine. Computer Networks and ISDN Systems 30, 107-117. 1998. (citeseer) - Shashank


Pant et al. Search Engine-Crawler Symbiosis: Adapting to Community Interests. Proc. ACM/IEEE Joint Conference on Digital Libraries (JCDL 2004).

Risvik, K. M. and Michelsen, R. Search Engines and Web Dynamics. Computer Networks, vol. 39, pp. 289-302, June 2002. (citeseer)

Zhao, R.H., Meng, W., Wu, Z., Raghavan, V., Yu. C. Fully Automatic Wrapper Generation for Search Engines, 66-75 WWW 2005. (citeseer)


5. Vector Space Model for Representation and Retrieval

Information Retrieval: A Survey. Ed Greengrass, 2000.  Read Chapter 6 till the end of 6.4

6. Query Logs


Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D., Frieder, P. Hourly analysis of a very large topically categorized web query log.  Proceedings of ACM SIGIR conference, 2004.  (web search or ACM Digital Library) - Peter

Srivastava, J., Cooleyz, R., Deshpande, M., Tan, P-N. Web Usage Mining: Discovery and Applications of Usage
Patterns from Web Data SIGKDD Explorations 1(2), 12-23, 2000. (citeseer) - Puja

7. Information Extraction and Topic Profiles

Liu, B. et al., Mining Topic-Specific Concepts and Definitions on the Web. WWW 2003. (citeseer)

Sehgal, A.K. and Srinivasan, P. Profiling Topics on the Web. WWW Conference Workshop on I3: Identity, Identifiers, Identification. Entity-Centric Approaches to Information and Knowledge Management on the Web. May 2007.

8. Recommender systems


Cohen, W. W. Web-Collaborative Filtering: Recommending Music by SpideringThe Web. William W. Cohen, Computer Networks, 1999. (citeseer) - Senay

Lam, S.K. and J. Riedl, J. Shilling Recommender Systems for Fun and Profit. WWW 2004. (citeseer) - Mohammad

Hu, M., Liu, B. Mining and summarizing customer reviews. Proceedings of KDD 2004. (search) - Yelena


9. Web as a Lexical Resource

Lapata, M. and Keller, F. The Web as a Baseline: Evaluating the Performance of Unsupervised Web-based Models for a Range of NLP Tasks.  HLT. (search)


10. Some Applications

Luo et al. Answering relationship queries on the Web. WWW 2007. (search) - Jesse

Gabrilovich, Dumais and Horvitz. Newsjunkie: Providing personalized newsfeeds via analysis of information novelty. WWW 2004.


11. Web Communities

Matsuo Y, Mori J and Hamasaki M. POLYPHONET: An Advanced Social Network Extraction System from the Web. WWW 2006.(web search) - Lian

Adamic, L.and Adar, E.. Friends and Neighbors on the Web by Adamic and Adar.  (citeseer) - Ray

Harada et al. Finding Authoritative People from the Web Social Network, JCDL 2004. Elizabeth

12. Miscellaneous

Hu, M., Lim, E-P., et al. Measuring Article Quality in Wikipedia: Models and Evaluation.  CIKM 2007. (web from campus or ACM DL) - note there is an earlier paper with a slightly different name. - Peter

Pasca, M.  Weakly-Supervised Discovery of Named Entities Using Web Search Queries.
CIKM 2007. (web from campus or ACM DL) - Shashank

Carenini, Ng and Zhou.  Summarizing Email Conversations with Clue Words.  WWW 2007. (search) -


The University of Iowa / School of Library and Information Science / Department of Management Sciences / my email is linked here