A data mart is a repository that holds data relevant to a group of users with common needs, such as a business department.
Data marts generally exist in relation to a data warehouse. Marts can be dependent on warehouses, or they can acquire raw data from other sources.
Relationship Between a Data Mart and a Data Warehouse
Data marts and data warehouses have much in common. Both are relational databases, both work on production data that has been transformed, and both are used for analytics purposes.
However, the two have some crucial differences:
- Size: Data marts are generally under 100 GB, while warehouses have no fixed size limit.
- Line of business: Data marts support a single department while a data warehouse supports the entire organization.
- Sources: Marts use a limited number of sources, such as a data warehouse or departmental databases. Warehouses ingest data from multiple sources across the organization’s infrastructure.
- Ownership: Warehouses are owned by the organization’s data management team. A data mart is usually owned by the relevant department.
Both structures rely on an ETL (Extract, Transform, Load) process to provide them with cleansed, normalized data. Transformation only needs to be performed one time – once it fits the desired schema, data can then directly be passed to the next repository.
In practice, this allows organizations to structure their data marts in several ways:
- Independent data mart: Raw data is extracted and transformed by ETL, and then loaded directly into the data mart. Independent data marts can act as a source for a data warehouse.
- Dependent data marts: Dependent marts do not directly interact with data sources. Instead, the ETL layer connects with the data warehouse, where the cleansed data is held. Dependent data marts use this warehouse as their lone source.
- Hybrid data marts: Hybrid marts combine both of the above approaches, combining warehouse data with raw data acquired via ETL.
Independent data marts work best in small or medium organizations, where the overall volume of data is manageable. In larger organizations, data generally has to pass through a central warehouse first before being made available to data marts. Hybrid data marts are applied in situations where a dependent data mart needs to incorporate information from a source that’s not connected to the warehouse.
Data Mart Use Cases
Data marts are used to solve specific organizational problems, especially those that are unique to one department. Typical use cases for a data mart include:
Analytics is perhaps the most common application of data marts. The data in these repositories is entirely relevant to the requirements of the business department, with no extraneous information, resulting in faster and more accurate analysis. For example, financial analysts will find it easier to work with a financial data mart, rather than working with an entire data warehouse.
Data marts are generally faster to develop than a data warehouse, as the developers are working with fewer sources and a limited schema. Data marts are ideal for data projects operating under challenging time constraints.
Data marts can be a risk-free way to grant limited data access without exposing the entire data warehouse. For example, dependent data mart contains a segment of warehouse data, and users are only able to view the contents of the mart. This prevents unauthorized access and accidental writes.
Better Resource Management
Data marts are sometimes used where there is a disparity in resource usage between different departments. For example, the logistics department might perform a high volume of daily database actions, which causes the marketing team’s analytics tools to run slow. By providing each department with its own data mart, it’s easier to allocate resources according to their needs.
Implementation of Data Marts
Organizations generally follow a four-step process when creating a data mart.
Data marts are best understood as the solution to a specific business problem, like “how do we give our finance team fast access to analytics reports?” Before beginning, the data mart owners need to define the scope of this project by asking questions like:
- Who will access this data mart?
- What actions will they perform?
- What are the primary data sources?
- What data is to be included?
- What data must be left out?
The scope of the project will help to define what type of data mart is required. The two main types of data mart are:
- Physical: Data is imported from external sources and copied to a relational database. The database may be held in the cloud or hosted on-premise.
- Virtual: When resources are limited, or the source is a Big Data structure such as a data lake, organizations may use virtual data marts. This type of mart uses views to create a virtual mart, without creating a new physical database.
Most organizations will use different types of data mart to solve different kinds of problems, which is why a detailed requirement gathering process is essential before implementation.
For physical data marts, data must pass from the original source into the mart. How this is handled depends on the type of data mart and the nature of the source.
- ETL layer: If the mart is working with raw data, such as the information from a production database, then that data must pass through an ETL layer before it can be used for analytics. This action may be performed by hand or by using an automated ETL platform. ETL automation will ensure that the data mart has a steady pipeline of fresh information.
- Direct import from a trusted source: The data mart may also import directly from a trusted source, such as a data warehouse. Because the warehouse data is already cleansed, normalized and transformed according to the master schema, there is no need to pass through an ETL layer before updating the data mart.
Finally, the data mart is made available to users. If the data mart is solving the problem it set out to address, like speeding up finance analytics, then the mart is functional. Otherwise, the mart owners may need to review their implementation and amend accordingly.