What's the difference between a data mart and a data warehouse? And, are data marts still relevant in today's cloud-first world? Let's dive into the definitions of data marts and data warehouses, the use cases for both, and why data marts may not be all-too-important in today's cloud ecosystem.
Plus, we'll talk about the data modeling methods that go into designing a data mart. So, grab a coffee, sit back, and get ready to dive into the wonderful world of data marts.
Table of Contents
Data Mart vs. Data Warehouse
A data warehouse is a relational database where business information is stored for analysis. Data within the warehouse can come from a variety of sources, including CRM systems, BI tools, payroll, etc. A core tenet of data warehouses is that they contain data from many different sources — and they leverages all of those data points to paint a broader picture via queries.
A data mart is a segment of your data warehouse that is reserved for use in a specific line of business. This means that data marts are typically valuable for a single department or team (e.g., sales, marketing, legal, etc.) and are used to run workloads for data pertaining to that team's line-of-business.
Why is this useful?
Think about how much data a typical business has. Consumers create some 2.5 quintillion bytes of data daily. Trying to pack all of your business data into one cloud solution is difficult enough — trying to run analytics on all of that data across every line of business is nearly impossible (and certainly time-consuming.) Data marts let you break up that data into business roles — making queries run faster and keeping data more organized and contained for business use. If sales wants some cheese, marketing wants some turkey, and legal wants some bread, you don't want to bring a sandwich around and have them deconstruct it one-by-one. With data marts, you give each of them what they need.
Types of Data Marts
There are three primary types of data marts for typical business use.
- Dependent Data Marts: With dependent data marts, you can pull all of your data from a central data warehouse. There are definitely some benefits to this method (namely data centralization.) The primary purpose of developing data marts within your centralized warehouse is to improve performance (queries are run at the data mart level) and to give KPI tracking capabilities — since you can go in and develop KPI trackers for each data mart.
- Independent Data Marts: With independent data marts, your data marts aren't connected to the centralized data warehouse whatsoever. This may seem contradictory to the purpose of data warehousing (leveraging multiple data streams to create informed decisions) but it can be useful for short-term goals and rapid implementation. Often times, independent data marts are constructed during the process of building more complex dependent data marts — though many choose to skip this step altogether.
- Hybrid Data Marts: Hybrid data marts combine both data warehouse data AND data from separate systems (i.e. your tech stack, etc.). Usually, these are leveraged for ad-hoc integrations or situations where you need to utilize the data from disparate sources immediately. Ideally, you want to integrate these disparate sources into your warehouse — circumventing the need for a hybrid system.
Properties of a Data Mart vs. Data Warehouse
Let's take a look at the fundamental properties of a data mart vs a data warehouse.
- Size: Under 100GB
- Subject: Single-subject
- Sources: BI tools specific to a business line
- Scope: A single line-of-business or multi-functional department
- Decision: Supports tactical decision making that leverages single department goals and tracking to build a bigger overall picture
- Cost: $10,000 - $100,000 depending upon the scope of the data mart, integrations, and ETL
- Integrations: The integrations necessary for a specific line-of-business
- Size: Over 100GB
- Subject: Multiple-subjects
- Sources: Internal and external resources that make up your businesses data loads
- Scope: Multiple lines-of-business
- Decision: Supports strategic decisions that impact the entire business
- Cost: $100,000 + for in-house data warehouses, but costs are significantly lower when you leverage cloud solutions
- Integrations: All business integrations
Data Mart Structure and Data Modeling
Data marts play a critical role in data warehouse design. Depending on the data modeling method (or schema) you use, the way you construct and utilize data marts will differ wildly — which impacts the overall construction of your data warehouse solution.
There are tons of different data modeling methodologies that you can use for your business, but we'll cover the core two — Bill Inmon's Top-Down and Ralph Kimball's Bottom-Up.
Bill Inmon's Top-Down Approach
Bill Inmon defined a top-down approach for data mart construction. In this model, the data warehouse is constructed first, and it's the primary repository for all business data. Then, data warehouses are constructed ad-hoc to meet specific business needs. So, the data warehouse is constructed without data marts in mind, which are later added when specific business lines require them.
A Top-Down approach follows this model — data warehouse > ETL > data mart > OLAP cube > interfaces
There are a few key benefits to the Inmon model.
- Data is kept consistent since the data warehouse acts as the central repository for all business data.
- Data integrity is incredible due to the "data warehouse first" approach.
Ralph Kimball's Bottom-Up
Ralph Kimball has a reverse methodology for data modeling. This approach could be considered a bottom-up approach, but most define it as a "star schema" or (with a few tweaks) a "snowflake" schema. With Kimball's approach, data marts are constructed first, and the data warehouse is defined as a "union of all the data marts."
A Bottom-Up approach follows this model — data mart > ETL > data warehouse > OLAP cube > interfaces
There are plenty of benefits to the Kimball model.
- Data marts aren't separated from the data warehouse leading to easier construction and natural integration with many of the leading BI tools.
- Analytics can access data in the data marts directly without going through the data warehouse first.
- Kimball's model is cheaper to build and has rapid scaling (i.e., you can start small for cheap and scale up slowly to meet business needs.)
Which One is Best?
Like everything, it depends on your needs. Despite the banter between Kimball and Inmon (Inmon told Kimball, "You can catch all the minnows in the ocean and stack them together, and they still do not make a whale.") both of them presented valuable data modeling schemas. If you're looking to build a cheap up-front warehousing solution that you can eventually scale up-to-cost, Kimball's method may be valuable for you. But, if you need an upfront solution that you can use as a foundation, Inmon's solution will provide you with more value.
Here's the thing; not many businesses are building their own data warehouses these days outside of massive enterprises. And, since most businesses with lean on cloud-based data warehouse solutions (e.g., Redshift or BigQuery) these models aren't as widely in use today as they once were. Redshift and Snowflake both have their own hyper-complex data modeling that differs from both of these approaches.
The Benefits of Data Marts
- Allows users to access the exact data they need to make informed decisions
- Reduces risks and allows you to monitor and segment data usage
- Can build KPI trackers for each data mart
- Easy to integrate with BI tools
- Gives you a clearer look at data for each specific line-of-business
- Better performance since you can run queries at the data mart level
- Cheaper to build than a data warehouse
- Can framework data for better business-line accessibility
- Departments can control their own data workloads
- And data marts can act as building blocks towards a more robust data warehouse
Data Marts and ETL
Since your ETL tool is the fuel that drives your data warehouse, understanding the role of ETL and data marts is crucial. Remember, data marts are slices of your data warehouse. If you chose to build a warehouse using Kimball's modeling, you'll run ETL loads directly against data marts. So, you'll build your data pipelines from your data marts into your data warehouse.
If you use Inmon's approach, you'll run ETL against the data warehouse into your data marts. So, ETL loads are run directly against your data warehouse data lakes.
Ideally, you should choose an ETL tool that can support both of these workflows. Xplenty has easy-to-setup pipelines and rapid integration with all of the leading BI tools. Plus, it works perfectly at both the data mart and data warehouse level of ETL.
Data marts can help you segment your data warehouse into slices of hyper-usable blocks that you can leverage to increase performance, integrate specific tools, and give each department their own source of queries. For many brands, data marts will be a thing-of-the-past. With the rise of cloud warehouses, data marts aren't as important as they once were. That being said, if you're in the process of designing your own data warehouse, thinking about data marts and data warehouses can help you choose your overall data modeling strategy.