What is Data Integration?

Data integration is the process of taking data from multiple disparate sources and collating it in a single location, such as a data warehouse. Once integrated, data can then be used for detailed analytics or to power other enterprise applications.

How is Data Integration Performed?

In an enterprise environment, this process usually requires an integration layer, which is often an ETL (Extract Transform Load) application such as Xplenty.

This integration layer sits between the source databases, which hold raw data from production systems and other sources, and the target database, which is the ultimate destination for the data. The integration process passes through the following steps:

Extraction

Data is acquired from the source databases. API calls usually do this, but other methods, such as file exports, may be required for some systems.

Transformation

The raw data is copied into staging tables, and the target database schema is applied. This stage usually involves some data cleansing to remove corrupt, empty, or duplicate values. There may also be some normalization or harmonization to improve the overall quality of the data.

Loading

Clean data with the target schema is moved to its final destination, which is often a data warehouse or similar structure. The integrated data then becomes available for any relevant business purposes.

The integration layer is a fundamental element of a data pipeline, which keeps data flowing from sources to the target. ETL tools allow this data flow to be fully automated. Machine learning and AI can help to refine the target schema and adapt to any changes in the source databases.

Data integration is always performed for a specific purpose, some examples of which are described below.

Uses of Data Integration in Enterprise

Most businesses have a wide array of data sources at their disposal. These sources include production databases, cloud-based systems such as CRM and ERP, web analytics, and data from partners, among others.

Such businesses may identify business goals that require data integration. Examples of such goals include:

  • Validation: The business needs to check the accuracy of data by comparing it to a schema or matching it against data from another source.
  • Consolidation: The business wants to centralize data storage, to improve efficiency or to store big data more cost-effectively.
  • Process enablement: The business wants to create a new process that is only possible with an integrated data source. For example, a new marketing automation platform might require a unified source of client data.
  • Master data management (MDM): If the business uses MDM as part of its data governance strategy, it will use integration techniques to produce master data.
  • Analytics and business intelligence (BI): Perhaps the most common application of data integration, many businesses need a unified data source for analytics purposes, as well as other BI applications.  

Data integration is rarely an end in itself. Instead, integration is used to improve efficiency, enable analytics, and solve organizational problems that arise from having siloed data.