Operational data stores (ODS) are data repositories that store a snapshot of an organization's current data. This is a highly volatile data repository that is ideally suited for real-time analysis.
An ODS can sometimes act as an intermediate stage between transactional databases and a data warehouse.
How Does an Operational Data Store Work?
When an ETL-driven data pipeline supplies a data warehouse, there are three stages to the ingestion process: extraction from target sources, data transformation, and loading to the destination. The transformation layer uses a staging database that holds the raw data from production databases. This staging database is small and light, holding only the most recently imported data values.
An operational data store works in much the same way as the staging area in the ETL process. An ODS process will import raw data from production systems and stores it as-is. However, while ETL then applies transformations, the ODS makes the raw data available to business intelligence (BI) tools for analysis.
Sometimes, the enterprise may use replicate data in an ODS for BI purposes, and then use an ETL process to transport the ODS data to a warehouse. This approach can reduce the burden on the production databases supplying the raw data.
What are the Applications of an Operational Data Store?
The operational data store approach enables a number of use cases that aren't always possible with other methods. These include:
- Consolidation: The ODS approach can bring together disparate data sources into a single repository. It lacks the benefits of other repositories such as data lakes and data warehouses, but an operational data store has the advantage of being fast and light. ODS can consolidate data from different sources, different systems, or even different locations.
- Real-time reporting: An operational data store will generally hold very recent versions of business data. Combined with the right BI tools, businesses can perform real-time BI tasks, such as tracking orders, managing logistics, and monitoring customer activity.
- Troubleshooting: The current state view of ODS makes it easier to identify and diagnose issues when they occur. For example, a user might create an order on the e-commerce system, which should create a corresponding order on the logistics system. But this might have the wrong details due to an integration error. The ODS will hold both versions of the data, allowing for easy comparison between the two systems. Automated processes can spot these problems and take action.
- System integration: Integration requires a continuous flow of data between systems, and ODS can provide the platform for this kind of exchange. It's possible to build business rules on an ODS so that data changes in one system triggers a corresponding action on another system.
ODS can also serve as a target source for an ETL process, which then loads the data to a warehouse for long-term storage.
What are the Differences between an Operational Data Store and Data Warehouse?
ODS and data warehousing have a lot in common, mainly because they both import and consolidate data from disparate sources. There are some crucial differences, however. These include:
Perhaps the biggest difference between the two is the volatility of the data. ODS data is extremely volatile, with values changing in something close to real-time. The contents of an ODS may change dramatically from one moment to another, depending on the target sources' nature.
A data warehouse is much more stable. Warehouses retain historical values and integrate them with new incoming values. Updates to data warehouses generally happen in scheduled batches, so the warehouse contents might only change a few times per day.
Data warehouses are schema-on-write. This means that incoming data goes through an integration process before it ingestion into the warehouse. The integration process cleanses, harmonizes, and organizes according to the warehouse schema.
An ODS doesn't require the same kind of transformations. Instead, data remains in its existing schema. In this sense, ODS is more like a data lake, which uses the schema-on-write approach, although an ODS is much smaller than a data lake (and can only store structured data.)
ODS and data warehouses both answer crucial BI questions, but the nature of these questions can vary. An ODS holds data that is very recent but limited. This makes it suitable for operational queries about the business' current state, such as "how many logged-in users do we have right now?" or "how many orders have we processed today?"
Data warehouses don't always update in real-time, but they do have a much larger volume of data. This makes them more suitable for questions of strategy based on long-term trend data, such as "how did we perform in Q1?" and "which employees are meeting targets?"
ODS data represents a short window within the data lifecycle. New incoming data overwrites any previous data. This means that BI queries will only ever reveal a small piece of the picture, which is acceptable for queries focused on the current state.
Data warehouses integrate new data with the existing contents of the warehouse. This results in a much larger repository that supports historical queries. Warehouses can also support data analytics activities that require substantial volumes of data.
ODS growth is always linear. As new data arrives, expired data is overwritten or removed from the ODS. Growth happens in relation to the growth rate of data on the transaction databases. One upshot of this is that there's less pressure to worry about scaling up the storage space available to ODS.
On the other hand, data warehouses can grow exponentially. New data and historical data exist side-by-side, so the size of the warehouse can expand quite quickly. This is why most data warehouses rely on a cloud infrastructure, which is easy to scale up according to demand.