The 5 Critical Differences Between DynamoDB vs MongoDB:
- MongoDB is vendor agnostic, Open Source, and can be deployed anywhere. DynamoDB is only available on AWS.
- DynamoDB is a fully managed AWS service, MongoDB can be self installed or fully managed with MongoDB Atlas.
- DynamoDB as an integrated AWS service makes it easier to develop end to end solutions.
- DynamoDB uses tables, items and attributes, MongoDB uses JSON-like documents.
- DynamoDB supports limited data types and smaller item sizes; MongoDB supports more data types and has fewer size restrictions.
MongoDB vs DynamoDB: How do you choose between them? Whether you are a two-man team bootstrapping a proof of concept or an established one battling with high throughput and heavy load; this post can serve as a guidepost in your decision process. Before going into the details, a brief history lesson on how these technologies emerged is pertinent; you must understand the optimal conditions for running these systems and how they operate in the wild before making an informed choice.
Table of Contents:
- The Emergence of NoSQL
- CAP Theorem
- DynamoDB vs MongoDB: 6 Critical Differences
- A Note on AWS Integration
The Emergence of NoSQL
Before the era of Big Data, relational database management systems (RDBMS) were king. The relational model pairs well with traditional client-server business applications that inherently operate on structured data. Classical relational databases follow the ACID property. That is, a database transaction must be Atomic, Consistent, Isolated, and Durable. In a nutshell, this guarantees consistency; every modification on data will transfer the database from one consistent state to another consistent state.
However, many of these systems could not cost-effectively scale with massive volumes of unstructured data; engineering teams began looking for alternatives. NoSQL ("Not Only SQL") came to the fore with creations such as MapReduce and Bigtable, Cassandra, MongoDB, and DynamoDB. The real advantage of NoSQL is horizontal scaling (or "sharding"), meaning that one scales by adding more machines into the pool of resources. For example, each row is stored independently, allowing even distribution across nodes in a cluster. This is opposed to vertical scaling, where one increases the size and computing power of a single instance or node without increasing the number of nodes or instances.
Related Reading: NoSQL vs SQL
In the year 2000, Dr Eric Brewer gave a keynote speech at the Principles of Distributed Computing conference called "Towards Robust Distributed Systems". Here he posed CAP theorem which states that a distributed (i.e. scalable) system cannot guarantee Consistency, Availability and Partition Tolerance in unison; there is assurance for only two out of the three:
- AP: Highly available and partition tolerant, but not consistent.
- CP: Consistent and partition tolerant, but not highly available.
- CA: Highly available and consistent, but not partition tolerant.
RDBMS systems are mainly characterized as CA systems. There is no partition tolerance, and therefore they are usually implemented as a single node, resulting in expensive vertical scaling.
If a NoSQL distributed database chooses availability over consistency (it is an AP system), it cannot provide ACID transactions. Instead, systems like this typically offer a set of properties known as BASE (Basically Available, Soft state, Eventual consistency) which provides a weaker degree of reliability for transactions.
Daniel J. Abadi of Yale University wrote a paper called "Consistency Tradeoffs in Modern Distributed Database System Design" which outlined some of the shortcomings of CAP. The PACELC theorem explores the scenario when there is no partitioning (i.e., when the network is healthy). The acronym means if we suffer from network partitioning (P), we have to choose between availability (A) or consistency (C), else (E) we have to choose between latency (L) or consistency (C). PAC is CAP backward, and the ELC is the extension.
It is worth mentioning the emergence of NewSQL relational database management systems (DBMS); which aim to match the elastic scalability and performance of NoSQL systems for OLTP (Online Transactional Processing) while giving RDBMS level ACID compliance for transactions. VoltDB is a good example, which provides strong consistency (CP) and chooses consistency (C) over availability (A) and also provides partition tolerance.
Integrate Your Data Today!
Try Xplenty free for 7 days. No credit card required.
DynamoDB vs MongoDB: 5 Critical Differences
1) Fully Managed
DynamoDB is a fully managed solution. Using a fully managed service reduces the amount of time a team spends on operations; (no pager duty alerts), no servers to update, kernel patches to roll out, SSDs to replace, hardware provisioning, setup/configuration, throughput capacity planning, replication, software patching, or cluster scaling. The focus shifts to application logic where the real value lies. The general rule of thumb is to choose Dynamo for low throughput apps as writes are expensive and consistent reads are twice the cost of eventually consistent reads. MongoDB's Atlas cost comes from infrastructure availability and backups for external managed services; throughput is inclusive of the pricing. If you do not have a dedicated operations person on your team, Dynamo is a better choice.
2) Out Of The Box Security
DynamoDB provides out of the box security; the security model is based on Identity and Access Management (IAM), enabling one to manage access to AWS services and resources securely. One can create and manage AWS users and groups and use permissions to allow and deny their access to AWS resources. IAM has been battle-tested and found to be intuitive and cooperative with limited configuration. It is not possible to access DynamoDB from the open internet as it is not directly addressed, requests route through an API gateway, and AWS manages authorization from here. MongoDB is secure, but the default configuration is not. Because it does not provide out of the box security, it can be particularly vulnerable to breaches.
DynamoDB supports key-value queries. For queries requiring aggregations, graph traversals, or search, data must be further injected into complimentary AWS services, such as Elastic MapReduce or Redshift; this inherently increases latency, cost, and cognitive load for developers. As this is a managed service, it is not possible to mitigate by tuning certain database elements such as index use, query structure, data models, system configuration (e.g., hardware and OS settings), and application design, which can significantly impact the overall performance of an application. MongoDB's query language allows developers to query and analyze data in many ways; single key, graph traversal, geospatial queries, range, faceted search, and much more. There is minimal latency, and it is possible to obtain deep levels of performance metrics granularity for optimization and tuning purposes if necessary; throughput metrics, Database performance, Resource utilization, Resource saturation, Errors (asserts).
4) Mutable Indexes
MongoDB supports mutable indexes, allowing the structure of a document to be altered based on dynamic development conditions. It is possible to change the structure of a document without having to update the collection schema on the backend. DynamoDB indexes are immutable; one would have to create a new table with the new name and drop/delete the old one, this is not possible in production systems without considerable resources for a safe transition.
5) Data Types
In comparison with MongoDB, DynamoDB has limited support for different data types and items are restricted to 400 KB as opposed to MongoDB, which supports up to 16MB document size. AWS charges significantly higher operating prices when items exceed 1 KB in size and suggest persisting larger objects in S3. Depending on one's usage, this may or may not be viable – S3 writes can be slow, and high throughput might not be possible. Dynamo only supports one numeric type and does not support dates.
6) Vendor Lock-In
Using DynamoDB may lead you to vendor lock-in. AWS uses a proprietary database model; moving to an alternative cloud provider would require significant resources to architect a new database system. Moreover, once you are dependent on multiple AWS services, it becomes increasingly difficult to focus on a multi-cloud strategy. The core issue is the cost of changing technologies and the resulting risk of disruption to the business. MongoDB is a transparent, open-source solution runnable anywhere; although MongoDB's SSPL(“Server Side Public License”) license has yet to be approved by the OSI (“Open Source Initiative”), the broader FOSS(“Free and open-source software”) community accepts it. As quoted by the Fedora project "Would you buy a car where the hood cannot be opened, and you will not be able to fix what's wrong or know what's happening?"
A Note on AWS Integration
If you are already heavily invested in the AWS ecosystem, DynamoDB is the better choice. It provides seamless integration with services such as Redshift (large scale data analysis), Cognito (identity pools), Elastic Map Reduce (EMR), Data Pipeline, Kinesis and S3. Dynamo has tight integration with AWS lambda via Streams and aligns with the server-less philosophy; automatic scaling according to your application load, pay-per-what-you-use pricing, easy to get started with, and no servers to manage.
DynamoDB vs MongoDB Trends
There is no size fits all, and every production system is different with its own needs and quirks; the following questions should lead to answers that will cement your position on which one to choose.
Is your team deploying a mission-critical application that must be highly available at all times without manual intervention?
Are you comfortable running on proprietary software, without control or knowing what's going on under the hood?
No matter which database system you choose, migrating your data into it could present serious challenges. If you're suffering from a data migration bottleneck, Xplenty's automated ETL platform can help. Xplenty offers a visual, no-code interface that makes data migration a snap. Check out our hundreds of out-of-the-box integrations or schedule a demo to find out how Xplenty can help you with your unique ETL challenges.