Hands-On Enterprise Application Development with Python
上QQ阅读APP看书,第一时间看更新

Controlling the concurrency

In the previous example, we came up with a problem of why can't we have a million threads, each dealing with an individual client? That should provide us with a lot of concurrency and scalability. But, there are a number of reasons that really prevent us from running a million threads at the same time. Let's try to take a look at the possible reasons preventing us from scaling our application infinitely:

  • Resource limitations: Every single client connection that is being handled by the server doesn't come free of cost. With every new connected client, we are expending some of the resources of the machine. These may include file descriptors that map to a socket, some amount of memory that is used to hold the information related to the opened socket, and so on. Every system can have a lot of memory but still it will be finite, and this finite memory is what decides how many sockets we can establish.
  • Costs associated with a new thread: Every single thread that we launch creates a new space for itself inside the memory. Although a lot of application-related data is shared between the threads, still there are quite a lot of thread-local details which every thread has to maintain. This puts pressure on the system resources and limits how many threads we can have running at the same time.
  • The cost of context switches: When we are dealing with threads, we need to remember one thing. Not all the threads can be executing in parallel. The number of threads that could be running in parallel depends upon a number of factors that may involve how many cores that the system has, the underlying Python implementation being used, and so on. To provide a fair opportunity for every thread to run, the operating system frequently switches between the threads. Every such switch is called a context switch where the memory structures are unloaded for the thread being switched out and memory structures are being loaded for the thread that will be executing next. This is a highly CPU-intensive operation. Imagine, we have a million threads each contending with each other to execute. This will cause our system to thrash where the CPU spends most of its time handling the context switches of the threads and hence, reducing the system throughput considerably.

These points give us some idea about why we cannot have an infinite concurrency available to our disposal.

But as it turns out, we apparently don't require an infinite concurrency. Rather, with the context to web applications, we should be just fine with a much smaller number of threads. But what makes this possible? Let's try to explore the reasons behind such a claim:

  • Short lived requests: In most of the web applications, most of the incoming requests are short lived. What that means is, a single request usually deals with the manipulation of a small amount of data that the server can quickly generate and return the results for. This allows for quick freeing up of resources to handle the next request.
  • I/O wait: Most of the time, as that happens, the requests are bottlenecked by the client side network I/O due to the limited bandwidth of the client. In that case, a few of the threads can be waiting on I/O while the others can quickly process the other requests, significantly reducing the number of threads that needs to be running at the same time.

Now we know we don't necessarily need an infinite number of threads to make our application scale for a large number of requests. This brings us to the concept of resource pooling.

How about if we create a fixed number of threads and a sane connection queue limit for the application and then use this fixed number of threads to cater to the incoming clients. That should provide us with a reasonable trade-off between the resource consumption and how many clients we can handle concurrently.