
6K278 Web Mining
Spring 2008
2:30P - 3:45P TTh 2058 LIB
Course Schedule
1. Web Crawlers
Pant, G., Srinivasan, P., Menczer, F. (2004). Crawling the Web. Web Dynamics: Adapting to Change in Content, Size, Topology and Use, edited by M. Levene and A. Poulovassilis: 153-178.
Baeza-Yates, R., Castillo, C., Marin, M. and Rodriguez, A. (2005). Crawling a Country: Better Strategies than Breadth-First for Web Page Ordering. In Proceedings of the Industrial and Practical Experience track of the 14th WWW conference, 864-872, Chiba, Japan. ACM Press. (citeseer)
(Historical interest) Pinkerton, Brian. Finding What People Want: Experiences with the WebCrawler
2. A Few Crawler-Based Applications:
Pant, G., Tsioutsiouliklis, K., Johnson, J., Giles, C.L. Panorama: Extending Digital Libraries with Topical Crawlers. Proc. ACM/IEEE Joint Conference on Digital Libraries (JCDL 2004).
Srinivasan P., Mitchell J., Boderreider O., Pant G. and Menczer F. Web Crawling agents for Retrieving Biomedical Information. Proceedings of NETTAB 2002 Workshop on Agents in Bioinformatics. Bologna, Italy, July 2002.
3. PHP/MySQL/Perl
4. Search Engines
Brin, S. and Page, L. Anatomy of a Large-Scale Hypertextual Search Engine. Computer Networks and ISDN Systems 30, 107-117. 1998. (citeseer) - Shashank
Pant et al. Search Engine-Crawler Symbiosis: Adapting to Community Interests. Proc. ACM/IEEE Joint Conference on Digital Libraries (JCDL 2004).
Risvik, K. M. and Michelsen, R. Search Engines and Web Dynamics. Computer Networks, vol. 39, pp. 289-302, June 2002. (citeseer)Zhao, R.H., Meng, W., Wu, Z., Raghavan, V., Yu. C. Fully Automatic Wrapper Generation for Search Engines, 66-75 WWW 2005. (citeseer)
5. Vector Space Model for Representation and Retrieval
Information Retrieval: A Survey. Ed Greengrass, 2000. Read Chapter 6 till the end of 6.4
6. Query Logs
Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D., Frieder, P. Hourly analysis of a very large topically categorized web query log. Proceedings of ACM SIGIR conference, 2004. (web search or ACM Digital Library) - PeterSrivastava, J., Cooleyz, R., Deshpande, M., Tan, P-N. Web Usage Mining: Discovery and Applications of Usage
Patterns from Web Data SIGKDD Explorations 1(2), 12-23, 2000. (citeseer) - Puja
7. Information Extraction and Topic Profiles
Liu, B. et al., Mining Topic-Specific Concepts and Definitions on the Web. WWW 2003. (citeseer)
Sehgal, A.K. and Srinivasan, P. Profiling Topics on the Web. WWW Conference Workshop on I3: Identity, Identifiers, Identification. Entity-Centric Approaches to Information and Knowledge Management on the Web. May 2007.
8. Recommender systems
Cohen, W. W. Web-Collaborative Filtering: Recommending Music by SpideringThe Web. William W. Cohen, Computer Networks, 1999. (citeseer) - SenayLam, S.K. and J. Riedl, J. Shilling Recommender Systems for Fun and Profit. WWW 2004. (citeseer) - Mohammad
Hu, M., Liu, B. Mining and summarizing customer reviews. Proceedings of KDD 2004. (search) - Yelena
9. Web as a Lexical Resource
Lapata, M. and Keller, F. The Web as a Baseline: Evaluating the Performance of Unsupervised Web-based Models for a Range of NLP Tasks. HLT. (search)
10. Some Applications
Luo et al. Answering relationship queries on the Web. WWW 2007. (search) - Jesse
Gabrilovich, Dumais and Horvitz. Newsjunkie: Providing personalized newsfeeds via analysis of information novelty. WWW 2004.
11. Web Communities
Matsuo Y, Mori J and Hamasaki M. POLYPHONET: An Advanced Social Network Extraction System from the Web. WWW 2006.(web search) - Lian
Adamic, L.and Adar, E.. Friends and Neighbors on the Web by Adamic and Adar. (citeseer) - Ray
Harada et al. Finding Authoritative People from the Web Social Network, JCDL 2004. Elizabeth
12. Miscellaneous
Hu, M., Lim, E-P., et al. Measuring Article Quality in Wikipedia: Models and Evaluation. CIKM 2007. (web from campus or ACM DL) - note there is an earlier paper with a slightly different name. - Peter
Pasca, M. Weakly-Supervised Discovery of Named Entities Using Web Search Queries.
CIKM 2007. (web from campus or ACM DL) - ShashankCarenini, Ng and Zhou. Summarizing Email Conversations with Clue Words. WWW 2007. (search) -