Python Web Scraping Cookbook
上QQ阅读APP看书,第一时间看更新

Getting ready

To perform the following example, you will need an AWS account and have access to secret keys for use in your Python code. They will be unique to your account.  We will use the boto3 library for S3 access. You can install this using pip install boto3.  Also, you will need to have environment variables set to authenticate.  These will look like the following:

AWS_ACCESS_KEY_ID=AKIAIDCQ5PH3UMWKZEWA
AWS_SECRET_ACCESS_KEY=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

These are available in the AWS portal under IAM (Identity Access Management) portion of the portal.

It's a good practice to put these keys in environment variables.  Having them in code can lead to their theft.  During the writing of this book, I had this hard coded and accidentally checked them in to GitHub.  The next morning I woke up to critical messages from AWS that I had thousands of servers running!  There are GitHub scrapers looking for these keys and they will get found and use for nefarious purposes.  By the time I had them all turned off, my bill was up to $6000, all accrued overnight. Thankfully, AWS waived these fees!