上QQ阅读APP看书，第一时间看更新

Handling big data

One massive problem faced by legacy RDBMS systems is difficulty managing Big Data (https://en.wikipedia.org/wiki/Big_data). Examples would include data produced by the NASA Center for Climate Change, the Human Genome Project, which analyzes strands of DNA, or the Sloan Digital Sky Survey, which collects astronomical data. RDBMS systems are designed to maximize storage, which was an expensive resource 50 years ago. In the 21st century, storage costs have dropped dramatically, making this a secondary consideration. Another aspect of RDBMS systems is their ability to provide flexibility by way of creating relations between tables, which by its very nature introduces overheads, compounded when handling big data.

MongoDB addresses the needs of big data by incorporating modern algorithms such as map reduce (https://en.wikipedia.org/wiki/MapReduce), which allows for parallel distributed processing on a cluster of servers. In addition, MongoDB has a feature referred to as sharding, which allows fragments of a database to be stored and processed on multiple servers.

It should be noted that although MongoDB is designed to handle big data, it is actually more of a general purpose platform. If your only need is to handle big data, it might be worth your while to investigate Apache Cassandra ( https://cassandra.apache.org/) with Hadoop ( http://hadoop.apache.org/), which is expressly designed to handle massive amounts of data.