Python Web Scraping Cookbook
上QQ阅读APP看书,第一时间看更新

How to do it

We won't parse the data in the planets.html file, but simply retrieve it from the local web server using requests:

  1. The following code, (found in 03/S3.py), reads the planets web page and stores it in S3:
import requests
import boto3

data = requests.get("http://localhost:8080/planets.html").text

# create S3 client, use environment variables for keys
s3 = boto3.client('s3')

# the bucket
bucket_name = "planets-content"

# create bucket, set
s3.create_bucket(Bucket=bucket_name, ACL='public-read')
s3.put_object(Bucket=bucket_name, Key='planet.html',
Body=data, ACL="public-read")
  1. This app will give you output similar to the following, which is S3 info telling you various facts about the new item.

{'ETag': '"3ada9dcd8933470221936534abbf7f3e"', 'ResponseMetadata': {'HTTPHeaders': {'content-length': '0', 'date': 'Sun, 27 Aug 2017 19:25:54 GMT', 'etag': '"3ada9dcd8933470221936534abbf7f3e"', 'server': 'AmazonS3', 'x-amz-id-2': '57BkfScql637op1dIXqJ7TeTmMyjVPk07cAMNVqE7C8jKsb7nRO+0GSbkkLWUBWh81k+q2nMQnE=', 'x-amz-request-id': 'D8446EDC6CBA4416'}, 'HTTPStatusCode': 200, 'HostId': '57BkfScql637op1dIXqJ7TeTmMyjVPk07cAMNVqE7C8jKsb7nRO+0GSbkkLWUBWh81k+q2nMQnE=', 'RequestId': 'D8446EDC6CBA4416', 'RetryAttempts': 0}}
  1. This output shows us that the object was successfully created in the bucket. At this point, you can navigate to the S3 console and see your bucket:
The Bucket in S3
  1. Inside the bucket you will see the planet.html file:
The File in the Bucket
  1. By clicking on the file you can see the property and URL to the file within S3:
The Properties of the File in S3