Data migration is the process of moving data from one location to another. While the concept of data migration is easy to understand, actually implementing a data migration is rarely a cakewalk.
In fact, data migration is one of the most complicated tasks in the field of data engineering. According to Gartner and Bloor Group:
“More than 80% of data migration projects run over time and/or over budget. Cost overruns average 30%. Time overruns average 41%.” -Bloor Group
“83% of data migration projects either fail or exceed their budgets and schedules.” -Gartner
Setting these sobering statistics aside, the more you know about data migration, the better you can overcome any challenges you encounter in the process. Therefore, in this guide, we're going to help you understand the following about data migration:
- What Are the Use Cases for Data Migration?
- Why Is Data Migration Challenging?
- What Are the Processes of Data Migration?
- Two Tips to Prevent a Data Migration Catastrophe
(1) What Are the Use Cases for Data Migration?
There are three primary use cases for data migration: (a) application migration; (b) storage migration; and (c) cloud migration:
(a) Application Migration:
Application migration is when you move an application from one storage/server location to a different one. You could be migrating an application (1) from an onsite server to a cloud-based server, (2) from a cloud-based server to another cloud-based server, or (3) you could be moving data from one application to a new application that only accepts data in a specific format.
(b) Storage Migration:
Storage migration is when you migrate data from legacy storage systems—often isolated ones that become walled-off into data silos (more about those below)—to storage systems that permit better integration across all the information systems that belong to an organization. Migrating data into a more integrated data warehousing system offers dramatically improved processing, flexible and cost-effective scaling. It could also provide advanced data management features like snapshots, cloning, disaster recovery, backups, and more.
(c) Cloud Migration:
Cloud migration is the process of transferring data from an onsite server to a cloud-based data warehouse. Of all the use cases, cloud migration might be the most important for large corporate data systems right now. In fact, Forbes reports that 83% of businesses will move their data systems to the cloud by 2020.
Businesses are leaving their onsite servers for the cloud because cloud-based data management systems:
- Reduce the overhead required to maintain onsite data systems.
- Charge only for the data services companies need when they need them.
- Offer greater flexibility and scalability.
- Allow corporations to extract cutting-edge machine learning business insights from their data.
If you’re interested in data migration, you’re already aware of these benefits. However, you might not be aware of the challenges of data migration itself.
(2) The Challenges of Data Migration
If you've never been involved in a data migration project, it's not likely you have a real appreciation for the challenges. The main issues you'll encounter relate to: (a) data gravity and data silos; (c) data security; (d) data complexity; (e) data loss and corruption; and (f) dealing with impatient stakeholders.
(a) Data Gravity and Data Silos:
One of the biggest challenges of data migration stems from data gravity. Data gravity happens when data attracts other data and applications to it. It refers to the way large data systems become “heavy” (in a figurative sense) and difficult to move.
“Data gravity is a term coined by Dave McCrory stating that data keeps accumulating (building mass) and with that increases the probabilities of additional services and applications being attracted to this data. This is similar to the Law of Gravity. The term was coined to describe the concept that as the mass of data increases, the amount, and speed of services and applications increases as well.
“As everything [gets closer] to the mass, they move toward the mass at faster velocity and this can be known as the 'pull' of the data.”
In some ways, data gravity is good because the more integrated applications are with their data, the more efficiently they run. However, when it's time to move the data to a new storage system, it's difficult to disentangle the data from the applications that are using it.
Another gravity-related challenge is the data silo. Data silos are isolated, incompatible data formats within a large data system. They develop when an application works with unique data structures that don't communicate with the rest of the system. In many cases, data silos remain isolated but sometimes data engineers resort to jerry-rigged workarounds (i.e., inefficient data pipelines) to integrate a data silo with the rest of the system.
When migrating data in a data silo, you have to:
- Undo the jerry-rigged workarounds, which can be time-consuming.
- Figure out new solutions to get the data to integrate with the target system or application.
Data silos have ruined many data migration projects. However, new ETL (extract, transform, load) technologies, like Xplenty's data integration tools, can help you overcome these hurdles.
(b) Legal Compliance and Data Security
Another difficulty in data migration relates to legal compliance and security. For one, you need to understand and adhere to all compliance-related requirements that apply to data security and data storage in your industry. For example, SEC Rules 17a-3 and 17a-4 require brokerage firms to use an electronic records system that prohibits overwriting and saves records—along with separately stored duplicates of the records and indexes—for at least six years.
To overcome the security risks associated with moving your data you may want to work with a migration and storage expert for your industry. These professionals can help with:
- Data encryption: Ideally, you can store legacy backup tapes—and encrypt and migrate the data to a new media format simultaneously.
- Chain of custody: Secure unencrypted historical data from point of pickup through completed migration with one documented process.
- Offsite tape vaulting: Store your legacy source media and newly migrated archive tapes in your vendor's secure facility.
Also, consider the implementation of advanced user management strategies to make sure your data isn't accessible to the wrong people during and after migration. According to a Ponemon study, 62% of employees claim that they have access to data they shouldn't be able to see. Don't let that happen to your information.
(c) Data Complexity:
The more data you need to migrate, the more types and varieties of incompatible data you'll encounter. For example, imagine dealing with an old information system that stored 40-digit long claim numbers in one field, but the new system won't accept numbers this long. Dealing with incongruent data like this will require you to transform it into a compatible format. This might involve separating the numbers into smaller chunks that divide them into various parts for the client code, date, region, etc. Of course, you'll have to develop the code—or use an automatic data integration solution—to transform the data like this.
Another complexity happens with old, legacy systems that have duplicate information stored in multiple places. Migrating this data requires you to locate all of the copies to make sure you only migrate one copy, and to make sure you store it in the right location. This process is called "data normalization."
To overcome data normalization challenges, you want to have the best data migration tools, like Xplenty, at your disposal. According to this Xplenty user:
"We have to collect data from more than 30 different sources. Xplenty helped us maintain all the various API calls. We saved hundreds of developing hours. It also helps more junior analysts to be able to aggregate data without the help of the senior team. Last, it helped us to aggregate and normalize all our data sources."
(d) Data Loss Or Corruption
It's important to avoid the chances of data loss and corruption because losing even a single record could be catastrophic for your organization. One strategy for preventing data loss and corruption is to know the exact number of records you're migrating to the new system. If the migrated records don't match this number, you'll need to investigate why. For example, was it simply because you eliminated duplicate data and everything is fine? Or, did a record get lost, and how do you prevent it from happening again?
Another way to prevent data loss and corruption is to use the automated data validation tools provided by Xplenty to sample data outputs and ensure correctness and validity. For example, Xplenty can check whether the client code fields in the new system have the right number of characters and whether the new field types match with the old ones.
When testing and validating data during a data migration, you should:
- Consider any incidents that might have resulted in corrupt data in the past: Maybe there was a system failure back in 2015 that may have impacted the quality of certain records. Make sure to test these records specifically during your data migration process.
- Use large samples of data for testing: In a massive data system, you might not be able to test and validate every piece of information, but you should strive to validate at least 10% to 20% of the information.
- Start testing immediately and don't stop: Testing is not something to do at the end of the migration. You should verify the accuracy of the migration as soon as possible and continue testing throughout the migration process.
(e) Dealing with Impatient Stakeholders:
Finally, because all the above challenges can seriously delay your project, it's not uncommon for stakeholders to get impatient. Therefore, CTOs and developers should explain to stakeholders that data migration is infinitely more involved than simply switching a few hard drives or pushing a button to upload data to the cloud. By educating stakeholders about these complexities, they'll be more patient when the inevitable challenges and delays crop up.
(3) How Does the Data Migration Process Work?
You may have heard about “ETL” platforms like Xplenty which offer data migration and data integration services. ETL in this context stands for the following three stages in data migration: (a) extract data, (b) transform data, and (3) load data.
(a) Extract Data:
Extraction is one of the most delicate parts of data transformation because if you fail to correctly extract the data, the rest of the processes will fail. During extraction, you may pull data in a variety of formats from the source. These formats might include relational formats like XML, RDBMS, JSON, and flat files. They may also include non-relational formats and more.
During the data extraction phase, you will convert these formats into a new format that permits you to transform it in the next stage. Another element of extraction involves the verification that the extracted data is correct and accurate.
As a laborious and error-prone process, manual data extraction slowed down data developers for many years. Now, with automated solutions like Xplenty, you can bypass these bottlenecks by automating data extraction. Here’s what a G2Crowd user said about Xplenty’s data extraction tools:
“Xplenty solves the problem of manual data extraction and insertion, and the errors that occur in this process. After configuration, which is a very important step, we realized a large time savings from this manual process. Any errors with our data that arose we knew were on our end, and this allowed for faster problem identification and resolution.”
(b) Transform Data:
The transformation process applies specific rules that transform the extracted data. This serves to normalize the data to load it into the new target structure (often a data warehouse).
Part of data transformation involves “data cleansing” to ensure that only the right data gets loaded into the target structure. For example, you might set rules that:
- Ensure that no duplicate columns or duplicate sets of data get loaded into the target.
- Choose only specific information to load.
- Divide certain columns into more than one column.
- Change coded values.
- Instruct how to sort the information.
- Perform a wide variety of other functions.
Finally, you might use data mapping to join columns and fields of data together from different sources.
An example of when you might need to transform data would be in the case of migrating data from one application to a new application that requires information in a different schema. You’ll need to transform the data into the right schema before it can integrate with the new application.
Manual data transformations can expend a lot of resources, but automated ETL solutions like Xplenty offer instant transformations between a vast array of data structures belonging to popular information systems like Salesforce, Facebook, Survey Monkey and hundreds more. Here’s what a G2Crowd reviewer said about Xplenty’s automated transformation features:
“We needed to connect a number of sources, transform data, and load them into one centralized location fairly quickly with limited bandwidth from our DBAs. XPlenty enabled me, and one other analyst, to pick up that bit of the workload without a ton of training—so we were able to meet our deadlines. The speed and consistency of XPlenty is impressive, and it more than makes up for what a few tools in our kit may be lacking.”
(c) Load Data:
In the final stage, you’ll load the data into the target data warehouse or delimited file. This entire sequence could be repeated by automated software multiple times per hour, day, week, month or year. Certain data warehouses will have rules for organizing the information they are exposed to. For example, some data systems will overwrite existing data with cumulative data at specific intervals. Therefore, make sure you understand how you want your data warehouse to treat new data beforehand, so you can develop an appropriate strategy.
(4) Two Tips to Prevent a Data Migration Catastrophe:
During your data migration process, there will be a lot of opportunities for things to go horribly wrong. For example, imagine accidentally deleting decades worth of data relating to your company. To avoid a catastrophe like this, keep these final tips in mind:
- Secure your data with backups: Remember when you forgot to back up your 7th-grade research paper and it got zapped by a power outage? You can't afford a mishap like this when it's your company's valuable data. Therefore, whenever you're performing ETL operations, create backups of your resources—and test the accuracy of the backups—before moving forward with the data migration procedure. If a problem crops up, you'll be glad you took the time.
- Test all phases of your project: You'll have many opportunities to test the various stages of your data migration plan before implementing them. Make sure to do this as it will limit the risk of a data system meltdown.
Xplenty Data Migration Automation: Speed. Agility. Ease-of Use.
The speed and agility of Xplenty's out-of-the-box data migration tools bring businesses tremendous financial savings and efficiency. Plus, if you've been doing your data migrations the manual way, you won't believe how fast Xplenty is. Here's what one Xplenty user said about the speed of the platform:
"Building ETL pipelines with the speed of light. We could write integrations using python but Xplenty saved us a lot of time. We wanted to spend more time understanding data; not how to get to it and Xplenty did that for us."
"Xplenty helps to speed up the whole process of ETL. Allowing me to do more within the same amount of time, and it supports lots of different platforms."
If you're considering an automated ETL solution to get through your data migration challenges, the Xplenty team is available to offer their support.