Lessons > 24. Crawling & Analyzing Files
In the next two lectures, we will be doing a COVID-19 data analysis, pulling from a data repository that is organized and maintained by Johns Hopkins university:
Dependencies
Before you begin, you will need to install the “Beautiful Soup” dependency on the command line:
pip3 install bs4
Instructions
Open covid_analysis.py and utilities.py and take a look at them. Then complete the following tasks:
1. Download all of the COVID-19 Data Files
Write some code that downloads and saves all of COVID-19 data files (for each day since the outbreak was tracked, worldwide) to your files directory.
I have made two functions in the utilities.py module to help you:
- utilities.get_covid_file_links(), which extracts the web addresses of each data file.
- utilities.download_remote_file(), which will save a remote file to your file system.
2. Analyze the Data Files
After all of the data files have been downloaded, you are going to:
- Prompt the user for a state they’re interested in analyzing
- Iterate through each data file in the “files” directory in order to calculate the daily change in covid cases. Use the utilities.get_files_in_directory() function to help you. Within this iteration block, you will:
- Print the date and the daily change in cases to the screen (for every day for which you have data).
- Create an output CSV file to store your results
3. Make a Data Visualization
Finally, use tkinter to make a bar chart of cases by state by modifying the utilities.make_bar_chart() function.
Today's Activities
For today, please do the following:
1. Download the Exercise Files
Exercise Files2. Review the Slides
No slides available.
3. Watch the Lecture Video(s)
| Link | Title | Type | Duration |
|---|---|---|---|
| Video 1 | Live Lecture Exercise | lecture | 56:12 |