Python Web Scraping Cookbook
上QQ阅读APP看书,第一时间看更新

How to do it

Writing data to Elasticsearch is really simple. The following Python code performs this task with our planets data (03/write_to_elasticsearch.py):

from elasticsearch import Elasticsearch
from get_planet_data import get_planet_data

# create an elastic search object
es = Elasticsearch()

# get the data
planet_data = get_planet_data()

for planet in planet_data:
# insert each planet into elasticsearch server
res = es.index(index='planets', doc_type='planets_info', body=planet)
print (res)

Executing this results in the following output:

{'_index': 'planets', '_type': 'planets_info', '_id': 'AV4qIF3_T0Z2t9T850q6', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, 'created': True}{'_index': 'planets', '_type': 'planets_info', '_id': 'AV4qIF5QT0Z2t9T850q7', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, 'created': True}
{'_index': 'planets', '_type': 'planets_info', '_id': 'AV4qIF5XT0Z2t9T850q8', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, 'created': True}
{'_index': 'planets', '_type': 'planets_info', '_id': 'AV4qIF5fT0Z2t9T850q9', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, 'created': True}
{'_index': 'planets', '_type': 'planets_info', '_id': 'AV4qIF5mT0Z2t9T850q-', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, 'created': True}
{'_index': 'planets', '_type': 'planets_info', '_id': 'AV4qIF5rT0Z2t9T850q_', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, 'created': True}
{'_index': 'planets', '_type': 'planets_info', '_id': 'AV4qIF50T0Z2t9T850rA', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, 'created': True}
{'_index': 'planets', '_type': 'planets_info', '_id': 'AV4qIF56T0Z2t9T850rB', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, 'created': True}
{'_index': 'planets', '_type': 'planets_info', '_id': 'AV4qIF6AT0Z2t9T850rC', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, 'created': True}

The output shows the result of each insertion, giving us information such as the _id assigned to the document by elasticsearch.

If you have logstash and kibana installed too, you can see the data inside of Kibana:

Kibana Showing and Index

And we can query the data with the following Python code. This code retrieves all of the documents in the 'planets' index and prints the name, mass, and radius of each planet (03/read_from_elasticsearch.py):

from elasticsearch import Elasticsearch

# create an elastic search object
es = Elasticsearch()

res = es.search(index="planets", body={"query": {"match_all": {}}})

print("Got %d Hits:" % res['hits']['total'])
for hit in res['hits']['hits']:
print("%(Name)s %(Mass)s: %(Radius)s" % hit["_source"])Got 9 Hits:

This results in the following output:

Mercury 0.330: 4879
Mars 0.642: 6792
Venus 4.87: 12104
Saturn 568: 120536
Pluto 0.0146: 2370
Earth 5.97: 12756
Uranus 86.8: 51118
Jupiter 1898: 142984
Neptune 102: 49528