Skip to content

Latest commit

 

History

History
8 lines (5 loc) · 787 Bytes

README.md

File metadata and controls

8 lines (5 loc) · 787 Bytes

This is a repo for course projects @ Professor Torsten Suel's course Web Search Engines.

  1. Jcrawler : a primitive multi-threaded focused web crawler to collect web pages from www, with concentration on given key words. Language : python

  2. indexer : a c++ program to parse web pages, do reverse index, and generate final index for later query processing. involving massive data processing, file compression(var-byte).

  3. query processor, ask former built inverted index to answer user's search queries.

  4. Foursquare crawler and recommendation system : including a crawler to collect user, venue, rating, check in information from Foursquare, Twitter and Facebook, then apply machine learning algorithms (collaborative-filtering, SVD, etc) to recommend friends and venues to users.