High-level technology overview of MongoDB 4.x
When it was first introduced in 2009, MongoDB took the database world by storm, and since that time it has rapidly gained in popularity. According to the 2019 StackOverflow developer survey (https://insights.stackoverflow.com/survey/2019#technology-_-databases), MongoDB is ranked fifth, with 26% of professional developers and 25.5% of all respondents saying they use MongoDB. DB-Engines (https://db-engines.com/en/ranking) also ranks MongoDB as the fifth most widely used database, using an algorithm that takes into account the frequency of search, DBA Stack Exchange and StackOverflow references, and the frequency with which MongoDB appears in job postings. What is of even more interest is that the trend graph generated by DB-Engines shows that the score (and therefore ranking) of MongoDB has grown by 200% since 2013. You can refer to https://db-engines.com/en/ranking_trend for more details. In 2013, MongoDB was not even in the top 10!
There are many key features of MongoDB that account for its rise in popularity. Subsequent chapters in this book cover the most important of these features in detail. In this section, we present you with a brief, big-picture overview of three key aspects of MongoDB.
MongoDB is based upon documents
One of the most important distinctions between MongoDB and the traditional relational database management systems (RDBMS) is that instead of tables, rows, and columns, the basis for storage in MongoDB is a document. In a certain sense, you can think of the traditional RDBMS system as two dimensional, whereas MongoDB is three dimensional. Documents are typically modeled using JSON formatting and then inserted into the database where they are converted to a binary format for storage (more on that in later chapters!).
Related to the document basis for storage is the fact that MongoDB documents have no fixed schema. The main benefit of this is vastly reduced overhead. Database restructuring is a piece of cake, and doesn't cause the massive problems, website crashes, and security breaches seen in applications reliant upon a traditional RDBMS database restructuring.
The really great news for developers is that most modern programming applications are based on classes representing information that needs to be stored. This has spawned the creation of a large number of object-relational mapping (ORM) libraries for the various programming languages. In MongoDB, on the other hand, the need for a complex ORM infrastructure is completely eliminated as programmatic objects can be directly stored in the database as-is:
So instead of columns, MongoDB documents have fields. Instead of tables, there are collections of documents. Let's now have a look at replication in MongoDB.
High availability
Another feature that causes MongoDB to stand out from other database technologies is its ability to ensure high availability through a process known as replication. A server running MongoDB can have copies of its databases duplicated across two more servers. These copies are known as replica sets. Replica sets are organized through an election process whereby the members of the replica vote on which server becomes the primary. Other servers are then assigned the role of secondary.
This arrangement not only ensures that the database is continuously available, but that it can also be used by application code by way of read preferences. A read preference tells the replica set which servers in the replica set are preferred. If the read preferences are set less restrictively, then the first server in the set to respond might be able to satisfy the request, thereby implementing a form of parallel process that has the potential to greatly enhance performance. This setup is illustrated in the following diagram:
This topic is covered in extensive detail in Chapter 13, Deploying a Replica Set. Lastly, we have a look at sharding.
Horizontal scaling
One more feature, among many, is MongoDB's ability to handle a massive amount of data. This is accomplished by splitting up a sizeable collection across multiple servers, creating a sharded cluster. In the process of splitting the collection, a shard key is chosen from among the fields present with the collection's documents. The shard key is then used by the sharded cluster balancer to determine the appropriate distribution of documents. Application program code is then able, by its knowledge of the value of the shard key, to direct queries to specific members of the sharded cluster, achieving potentially enormous performance gains. This setup is illustrated in the following diagram:
This topic is covered in extensive detail in Chapter 15, Deploying a Sharded Cluster. In the next section of this chapter, we have a look at the major differences between MongoDB 3 and MongoDB 4.x.