For businesses of all sizes and industries, the world of big data is only getting bigger. According to a 2016 survey by IDG, the average company is now responsible for managing a mind-boggling 163 terabytes (163,000 gigabytes) of information.
Turning this raw data into cutting-edge insights doesn’t come easy. It requires businesses to master the practice of enterprise data management, so that employees can easily create, store, access, manage, and analyze the information they need to excel at their jobs.
Perhaps the two most common forms of data storage in enterprise data management are data warehouses and databases. So what’s the difference between a database and data warehouse, and which one is best for your situation?
What is a Database?
A database is an organized collection of information stored in a way that makes logical sense and that facilitates easier search, retrieval, manipulation, and analysis of data.
Perhaps the most common way of classifying databases is SQL vs. NoSQL (also known as relational vs. non-relational).
A SQL or relational database organizes information within formal tables that codify relationships between different pieces of data. Each table contains columns and rows, similar to the structure of a spreadsheet in Microsoft Excel. In order to search through a relational database, users write queries in Structured Query Language (SQL), a domain-specific language for communicating with databases. The four most popular SQL database products, in no particular order, are Oracle, Microsoft SQL Server, IBM Db2, and MySQL.
On the other hand, a NoSQL or non-relational database uses any paradigm for storing data that falls outside the relational table-based model. Some common types of NoSQL databases are key-value, document-based, column-based, and graph-based stores. Popular NoSQL offerings include MongoDB, Cassandra, and Redis.
In terms of the SQL vs. NoSQL question, both approaches have their pros and cons. SQL databases tend to be easier to vertically scale (by adding more resources), while NoSQL databases tend to be easier to horizontally scale (by adding more machines). The use of SQL to write queries can be a major advantage for performance and ease of use, but relational databases are also less flexible and more rigid in terms of the data hierarchy.
What is a Data Warehouse?
A data warehouse is a system that aggregates and stores information from a variety of different sources within an organization.
The goal of a data warehouse is explicitly business-oriented: it is designed to facilitate decision-making by allowing users to consolidate and analyze the information at different levels. Data warehouses are useful when a certain query requires data beyond what’s stored in an individual database.
For example, an executive may want to reach a conclusion on how many employees to hire in the upcoming year, and in which departments. Such a decision will require looking at information such as employees, salaries, products, sales forecasts, etc. Rather than gathering this data from multiple individual databases, it’s much easier to store it within a single data warehouse.
Note that data warehouse solutions typically make use of relational databases, rather than NoSQL databases. This is because many data warehouses incorporate legacy data that is much more likely to be stored in a relational format, and also because NoSQL databases are less commonly used for data processing tasks.
Data Warehouse vs. Database: 7 Key Differences
We’ve provided a broad overview of databases and data warehouses, but how exactly do they differ in the specifics? Below, we’ll discuss 7 of the biggest differences between data warehouses and databases.
1. OLTP vs. OLAP
OLTP (online transaction processing) is a term for a data processing system that focuses on transactions. This is usually the dominant paradigm for databases that contain information used by a business on a day-to-day basis. Employees need fast, efficient queries and information that’s up-to-date and accurate, which OLTP is specifically designed to enable.
OLAP (online analytical processing) is a term for a data processing system that focuses on data analysis and decision-making, rather than performance and day-to-day use. Many OLAP systems are connected with business intelligence (BI) solutions that make it easier for non-technical managers and executives to get answers to their questions.
Businesses that need an OLTP solution for fast data access typically make use of a database. Meanwhile, data warehouse systems are better suited for an OLAP solution that can aggregate current and historical information.
2. Number of concurrent users
Because databases are OLTP systems, they have been designed to support thousands of users or more at the same time, without any degradation in performance.
OLAP data warehouses, on the other hand, can support only a relatively limited number of concurrent users. Because a data warehouse solution uses more complex queries circulating over many different data stores, it necessarily requires more resources and therefore is not as scalable as an enterprise-class database.
3. Use cases
In terms of their use cases, data warehouses and databases are also quite different.
Databases are most useful for the small, atomic transactions that are required for the day-today-functioning of an organization. Some examples include a hospital entering data about a new patient, a customer purchasing tickets via an online website, and a bank transferring money between two accounts.
Data warehouses are best suited for larger questions about an organization’s past, present, and future that require a higher level of analysis: for example, mining information from multiple databases to uncover hidden insights.
4. Service level agreements
As a consequence of their OLTP transactional nature, databases generally need to be available almost 24/7/365, somewhere upwards of 99.9 percent of the time. Downtime for OLTP databases can be extremely costly and even bring the business to a standstill.
However, downtime is not such a major concern for data warehouses, since they are used more for back-end analysis. In fact, most data warehouses have regularly scheduled downtime windows when more information is uploaded.
OLTP databases are optimized to be lightning-quick for the CRUD operations (create, read, update, and delete). However, more complicated analytical queries can rapidly bring down their performance.
OLAP data warehouses are optimized for a smaller number of more complex queries over multiple large data stores. Although response time remains an important metric, the more important concern for a data warehouse is the quality of the analyses that it performs.
In order to achieve their goal of rapid queries, OLTP databases are structured as efficiently as possible, with no duplicate information in multiple tables. This lowers both the disk space and the response time required to execute a transaction.
Redundant information is far less of a concern with OLAP data warehouses, since they devote less attention to the speed of a given query. Data warehouses typically denormalize their data, prioritizing read operations over write operations.
7. Reporting and analysis
Some limited reporting and analysis is possible on OLTP databases, but the normalized structure of the data makes it more difficult to perform. In addition, databases typically contain only the most up-to-date information for maximum efficiency, which makes historical queries impossible.
Data warehouses, on the other hand, have been designed from the ground up for reporting and analysis purposes. Users can pull from both current and historical data, enabling a wider range of insights.
Data Warehouse vs. Database: Data Marts and Data Lakes
If you thought that the question of databases vs. data warehouses was all there was to know in enterprise data management, think again. In this section, we’ll quickly go over two other alternatives to databases and data warehouses that may be of interest to your organization: data marts and data lakes.
A data mart is a database that is oriented toward storing information of a particular type, or for a particular set of users within an organization: for example, marketing, sales, finance, or human resources.
Data marts may be their own entity, or they may be a smaller partition as part of a larger data warehouse. In either case, the goal is to pare down an organization’s data into a more manageable size, usually less than 100 gigabytes.
A data lake is similar to a data warehouse, but without the strict requirements for how to organize the contents. Data lakes are a method of centralized data storage that does not necessarily structure the information in any type of way. Both structured and unstructured data can be stored together, and the data lake can use information from any source or data type.
Since data lakes are a bit of a “dumping ground” for both current and historical information, they are generally more flexible and adaptable than a structured database. However, this comes at a cost later on when developers and analysts want to process and use this information.
The question of data warehouses vs. databases (not to mention data marts and data lakes) is one that every business using big data needs to answer. As we’ve seen above, databases and data warehouses are quite different in practice. Deciding to set up a data warehouse is one indicate that your organization is committed to the practice of good enterprise data management.
For more information on data warehouses, databases, and enterprise data management, follow the Xplenty blog for the latest articles and resources.