Python multiprocessing module
Python provides an easy way to implement a multiprocess program. This ease of implementation is facilitated by the Python multiprocessing module, which provides important classes, such as the Process class to start new processes; the Queue, and Pipe classes to facilitate communication between multiple processes; and so on.
The following example provides a quick overview of how to use Python's multiprocessing library to create a URL loader that executes as a separate process to load a URL:
# url_loader.py
from multiprocessing import Process
import urllib.request
def load_url(url):
url_handle = urllib.request.urlopen(url)
url_data = url_handle.read()
# The data returned by read() call is in the bytearray format. We need to
# decode the data before we can print it.
html_data = url_data.decode('utf-8')
url_handle.close()
print(html_data)
if __name__ == '__main__':
url = 'http://www.w3c.org'
loader_process = Process(target=load_url, args=(url,))
print("Spawning a new process to load the url")
loader_process.start()
print("Waiting for the spawned process to exit")
loader_process.join()
print("Exiting…")
In this example, we created a simple program using the Python multiprocessing library, which loads a URL in the background and prints its information to stdout. The interesting bit here is understanding how easily we spawned a new process in our program. So, let's take a look. To achieve multiprocessing, we first import the Process class from Python's multiprocessing module. The next step is to create a function that takes the URL to load as a parameter and then loads that URL using Python's urllib module. Once the URL is loaded, we print the data from the URL to stdout.
Next, we define the code that runs when the program starts executing. Here, we have first defined the URL we want to load with the url variable. The next bit is where we introduce the multiprocessing in our program by creating an object of the Process class. For this object, we provide the target parameter as the function we want to execute. This is similar to the target method we have grown accustomed to while using the Python threading library. The next parameter to the Process constructor is the args parameter, which takes in the arguments that need to be passed to the target function while calling it.
To spawn a new process, we make a call to the start()method of the Process object. This spawns a new process in which our target function starts executing and doing its magic. The last thing we do is to wait for this spawned process to exit by calling the join() method of the Process class.
This is as simple as it gets to create a multiprocess application in Python.
Now, we know how to create a multiprocess application in Python, but how do we divide a particular set of tasks between multiple processes. Well, that's quite easy. The following code sample modifies the entrypoint code from our previous example to exploit the power of the Pool class from the multiprocessing module to achieve this:
from multiprocessing import Pool
if __name__ == '__main__':
url = ['http://www.w3c.org', 'http://www.microsoft.com', '[http://www.wikipedia.org', '[http://www.packt.com']
with Pool(4) as loader_pool:
loader_pool.map(load_url, url)
In this example, we used the Pool class from the multiprocessing library to create a pool of four processes that will execute our code. Using the map method of the Pool class, we then map the input data to the executing function in a separate process to achieve concurrency.
Now, we have multiple processes churning through our tasks. But what if we wanted to make these processes communicate with each other. For example, in the previous problem of URL loading, instead of printing the data on stdout, we wanted the process to return that data instead? The answer to this lies in the use of pipe, which provides a two-way mechanism for the processes to communicate with each other.
The following example utilizes pipes to make the URL loader send the data loaded from the URL back to the parent process:
# url_load_pipe.py
from multiprocessing import Process, Pipe
import urllib.request
def load_url(url, pipe):
url_handle = urllib.request.urlopen(url)
url_data = url_handle.read()
# The data returned by read() call is in the bytearray format. We need to
# decode the data before we can print it.
html_data = url_data.decode('utf-8')
url_handle.close()
pipe.send(html_data)
if __name__ == '__main__':
url = 'http://www.w3c.org'
parent_pipe, child_pipe = Pipe()
loader_process = Process(target=load_url, args=(url, child_pipe))
print("Spawning a new process to load the url")
loader_process.start()
print("Waiting for the spawned process to exit")
html_data = parent_pipe.recv()
print(html_data)
loader_process.join()
print("Exiting…")
In this example, we have used pipes to provide a two-way communication mechanism for the parent and child processes to talk to each other. When we make a call to the pipe constructor inside the __main__ section of the code, the constructor returns a pair of connection objects. Each of these connection objects contains a send() and a recv() method facilitating communication between the ends. Data sent from the child_pipe using the send()method can be read by the parent_pipe using the recv()method of the parent_pipe and vice versa.