Node's unique design
First, let's take an accurate look at the total time cost when your program asks the system to perform different kinds of services. I/O is expensive. In the following chart (taken from Ryan Dahl's original presentation on Node), we can see how many clock cycles typical system tasks consume. The relative cost of I/O operations is striking:
The reasons are clear enough: a disk is a physical device, a spinning metal platter — storing and retrieving that data is much slower than moving data between solid-state devices (such as microprocessors and memory chips), or indeed optimized on-chip L1/L2 caches. Similarly, data does not move from point to point on a network instantaneously. Light itself needs 0.1344 seconds to circle the globe! In a network used by many billions of people regularly interacting across great distances at speeds much slower than the speed of light, with many detours and few straight lines, this sort of latency builds up.
When our software ran on personal computers on our desks, little or no communication was happening over the network. Delays or hiccups in our interactions with a word processor or spreadsheet had to do with disk access time. Much work was done to improve disk access speeds. Data storage and retrieval became faster, software became more responsive, and users now expect this responsiveness in their tools.
With the advent of cloud computing and browser-based software, your data has left the local disk and exists on a remote disk, and you access this data via a network—the internet. Data access times have slowed down again, dramatically. Network I/O is slow. Nevertheless, more companies are migrating sections of their applications into the cloud, with some software being entirely network-based.
Node is designed to make I/O fast. It is designed for this new world of networked software, where data is in many places and must be assembled quickly. Many of the traditional frameworks to build web applications were designed at a time when a single user working on a desktop computer used a browser to periodically make HTTP requests to a single server running a relational database. Modern software must anticipate tens of thousands of simultaneously connected clients concurrently altering enormous, shared data pools via a variety of network protocols, on any number of unique devices. Node is designed specifically to help those building that kind of network software.
The breakthrough in thinking reflected by Node's design is simple to understand once one recognizes that most worker threads spend their time waiting—for more instructions, a sub-task to complete, and so on. For example, a process assigned to service the command format my hard drive will dedicate all of its allotted resources to managing a workflow, something like the following:
- Communicate to a device driver that a format request has been made
- Idle, waiting for an unknowable length of time
- Receive the signal format as complete
- Notify the client
- Clean up; shut down:
In the preceding figure, we see that an expensive worker is charging the client a fixed fee per unit of time, regardless of whether any useful work is being done (the client is paying equally for activity and idleness). To put it another way, it is not necessarily true, and most often not true, that the sub-tasks comprising a total task each require similar effort or expertise. It's therefore wasteful to pay a premium price for such cheap labor.
Sympathetically, we must also recognize that this worker can do no better even if ready and able to handle more work — even the best-intentioned worker cannot do anything about I/O bottlenecks. The worker here is I/O bound.
Instead, imagine an alternative design. What if multiple clients could share the same worker, such that the moment a worker announces availability due to an I/O bottleneck, another job from another client can start?
Node has commoditized I/O through the introduction of an environment where system resources are (ideally) never idle. Event-driven programming as implemented by Node reflects the simple goal of lowering overall system costs by encouraging the sharing of expensive labor, mainly by reducing the number of I/O bottlenecks to zero. We no longer have a powerless chunk of rigidly-priced unsophisticated labor; we can reduce all effort into discrete units with precisely delineated shapes, and therefore admit much more accurate pricing.
What would an environment within which many client jobs are cooperatively scheduled look like? And how is this message passing between events handled? Additionally, what do concurrency, parallelism, asynchronous execution, callbacks, and events mean to the Node developer?