NoSQL is a database management system built for the complexities of working with Big Data. Unlike SQL, NoSQL does not store data in a relational format.
NoSQL is not standardized. Each implementation of NoSQL offers its own solution for dealing with large sets of heterogeneous data. Even the name NoSQL is not standardized, with various parties claiming it stands for Non-relational SQL, Not SQL, or Not only SQL.
What is the Difference Between SQL and NoSQL?
SQL and NoSQL are solutions to different problems. SQL handles structured data by storing it in a system of normalized tables, with these tables connected by relationships. A schema defines tables, and each cell holds a single value, which is often a simple data type, such as a string or an integer.
NoSQL can handle all data, whether it’s structured, semi-structured, unstructured, or polymorphic. Tables and relationships are much more flexible, with dynamic schemas that adapt to fit the data. Data storage is in various formats, including key-value pairs or JSON documents.
Relational databases like SQL adhere to the ACID principle: Atomicity, Consistency, Isolation, Durability. NoSQL uses the BASE principle:
- Basically Available: A version of the data is available at all times.
- Soft state: The state of the database can change even without any input.
- Eventual consistency: The database tends towards consistency over time.
The main differences between SQL and NoSQL are:
- Schemas: SQL is schema-on-write, NoSQL is more flexible.
- Relational structure: SQL tables are relational, NoSQL structures are non-relational.
- File storage: Most versions of SQL cannot support documents such as JSON, while many NoSQL versions can hold such documents.
- Column structure: SQL tables have rigid column structures. NoSQL allows for dynamic table structures.
- Queries: SQL has a standard query language for retrieving results. Each version of NoSQL has its own method for retrieving database results.
From a structural point of view, another major difference is scalability. SQL is vertically scalable, which means that you have a single SQL server, and you scale up by adding resources such as disk space and RAM.
NoSQL is horizontally scalable, so one server reaches capacity, you create another NoSQL server to run in parallel. This approach is ideally suited to Big Data and distributed computing architectures.
What are the Most Common Versions of NoSQL?
NoSQL sets out to solve one major problem: how to store heterogeneous Big Data in a searchable way.
There are many implementations of NoSQL, and many of them have found novel solutions to this problem. Many of those solutions fall into one of four main categories:
- Document store
- Key-value database
- Wide-column store
- Graph database
The underlying structure of each of these is very different.
The key-value structure is one of the more basic approaches to NoSQL. Data exists as a pair: a key, which is a short string, and a value, which is any piece of data that the data owner wants to store. Values can be anything from a simple integer to a large binary file. To retrieve a value from the database, the administrator searches for the corresponding key.
This straightforward structure means that database functionality is quite basic. Key-value databases are ideal for storing big data sets that are not subject to frequent transactions, such as caches and system logs.
Examples of key-value NoSQL databases:
- Apache ZooKeeper
- Oracle NoSQL Database
The document structure is structurally similar to the key-value database, and stores data as a pair. While the key is again a short string, here the value is actually a document that contains more detailed data.
These documents are typically JSONs or similar, which hold data in a format similar to the attributes of an object in Java or C++. For example, a document relating to a single customer might look like this:
Type | Customer Name | B Carey Address | 1187 Park West, 78910 Phone | +555123456 CRM Ref | AZ010196
The advantage is that each customer record is a discrete object. The admin can make any changes, including adding or deleting fields. In a relational database, that would be impossible.
Examples of document databases:
- Apache CouchDB
- Lotus Notes
- Amazon SimpleDB
This approach, which was first developed by Google, is somewhat similar to a relational database. There are tables with rows and columns, but with one major difference: each row can have different columns.
Consider the following table in a relational database describing products:
ID | Item Name | Batteries required -----|-------------------|------------------- 0786 | Pocket Flashlight | 2 x AA
In a relational database, each item in this table would have a value for Batteries Required, even if most of the products didn’t require batteries. As databases scale up and become more complex, this inflexibility can become a problem.
Each row contains a series of key-value relationships in the wide-column store, with each key acting in the place of a column title. This database can store data as rows, as in an SQL database, but each row has an independent structure. The resulting structure is highly dynamic while also being easier to query.
Examples of wide-column storage:
- Google Bigtable
- Apache Cassandra
Graph databases use a relational structure, but of a different kind. These databases stores nodes, which are discrete data entities such as customers, products, or places. Each node can have multiple relationships, or edges, with other nodes.
With a system of nodes and edges, queries can very quickly reveal complex data structures. A common use case for this is the recommendation engine. These engines study the relationship between people and products. For example, it may identify that people who buy product A, product B and product C will eventually buy product D. Using this information, the recommendation engine can find people who had already purchased product A, B and C. It will then recommend to those people that they buy product D.
With data organized in nodes and edges, it becomes much easier to identify complex relationships. That’s why graph databases are popular in scenarios such as social networks and logistics systems.
Examples of graph databases:
- Apache Giraph