P2: SEARCH ENGINE PROJECT

For the Web Crawler / Search Engine project, please follow the instructions in the Google Doc.

Resources & Tips

We have also compiled a list of resources to help you complete the project:

1. Database Functionality

For database tips, refer to Lecture 13 and specifically:

  1. How to insert a row to a table
  2. How to update a row in a table
  3. How to querying by keyword
    Take a look at these links:

2. HTML

Take a look at the HTML sample files from Lecture 16.

3. Errors

Sometimes the URL you’re trying to crawl doesn’t exist or doesn’t have any content. If this is the case, your soup variable will be empty (e.g. soup is None will be True). If this happens, then just ignore the url and move onto the next one.