Data mapping is a way of moving data into a different table structure without affecting data integrity.
A data mapping schema determines where data values will sit when imported into a new structure. These schemas play an essential role in data warehousing, data integration, and data transformation.
What is the Data Mapping Process?
In its most basic form, data mapping involves looking at a source database, a destination database, and then deciding how to copy values from one to the other.
For example, an organization might have a system with several tables that hold customer information: Name, Address, and Membership Details. If the organization wants to copy this information to a single Customer Data table, they need to apply a data mapping schema that looks like this:
Creating this schema is a straightforward process:
- Clarify the structure of the data sources. Data can come from multiple sources, which will add complexity to the mapping process. For each source, you need to know the full database schema, including table structures, relationships, primary keys and foreign keys.
- Define the destination database. The destination database needs to be created if it hasn’t been already—the greater the differences between source and target, the more complex the mapping process.
- Map the data transfer. The data mapping schema details the relationship between source columns and destination columns. This schema should preserve relationships where possible and be mindful of unique identifiers such as primary keys.
- Plan for exceptions. Data transformation doesn’t always go smoothly, especially if the original source hasn’t been properly cleansed. Part of the data mapping process is to plan for such exceptions and to deal with issues like null values or incorrect data types.
- Test the destination database. The first important test is to ensure that the destination database is fully functioning by checking to see that all relationships function as expected and that no data has been corrupted in transit.
- Test for potential data loss. The final test is to ensure that all relevant data from the source has arrived at the destination. If the destination tables are incomplete, it may indicate that the data mapping schema needs to be updated.
The difficulty of this process depends on the complexity of the data. Data mapping can be implemented in several ways:
For small data sets, it may be possible to perform data transformation manually. This can be done in a number of ways, such as exporting data as XML and writing a small application in XSLT to perform the transformation.
When working with larger data sets, a schema mapping tool is generally required. There are several commercial schema mapping tools available, or they can be coded in languages like Python. These tools generally include a GUI that allows users to match columns between source and destination tables, as well as define rules and relationships. The resulting schema then needs to be applied manually.
Most enterprise use cases require fully-automated data mapping. This involves using a professional platform, such as Xplenty, to provide a fully automated transformation layer in the Extract Transform Load process.
Xplenty is pre-loaded with hundreds of integrations with popular services, so it can easily perform data mapping involving platforms like Salesforce, Google Analytics and Amazon S3. Most automated platforms (including Xplenty) include developer tools to fine-tune automated mapping. These tools can also help to produce mappings for sources that aren’t supported automatically.
When is Data Mapping Used?
Data mapping is essential when moving data values from one relational database to another. In practice, this boils down to four basic scenarios:
Data often needs to move from one source to another, such as when an organization is adopting a new platform. The new platform might handle the same basic data, but the underlying table structure will be completely different.
A data mapping schema is essential to get data from A to B. Schemas can be applied on a one-off basis, such as migration. They can also be part of a data pipeline implementation when integrating two systems.
Warehousing is very similar to migration, but it tends to involve multiple sources and much larger quantities of data. Data warehouses often hold an organization’s entire store of data, and this presents its own challenges, like how to normalize the data and reduce duplication.
A well-designed schema can help integrate many disparate data sources into a small number of relational tables. This allows for lower storage costs and faster analytics, without any risk of data loss.
Organizations might sometimes need to transform their data to make it suitable for other applications. For example, they might want to merge two sources to make analytics easier, or to power an application.
Data transformation can also play a role in data cleansing. The data mapping schema includes contingencies for exceptions, such as null values or incorrect data types. Running data through the transformation layer produces a clean, standardized database.
Electronic Data Interchange
When data can’t be accessed by API, it may be exported in file formats such as CSV, JSON and XML. These files may be imported into another relational database, in which case they may need to be exported in a ready-to-use format.
Data mapping can help if the end destination is known. The process is ultimately the same, except that the execution will result in the creation of a file. This file should be validated and checked for quality before moving on to the next stage.
Data Mapping and Compliance
Privacy laws such as GDPR have implications for companies that perform transformations on data. Because this may involve handling personal information, the data controller will need to meet certain criteria.
Generally, the requirement is that all transformation processes be transparent and well-documented. That includes details of any data mapping schema applied to personal data. In the event of an audit, organizations might be required to show how they transform personal data, and demonstrate a legit business reason for processing personal information in this way.