Over the last few years, enterprises across industries have become heavily reliant on data. In fact, as much as 90% of the data in the world today was generated over the last four years. At present, our output is approximately 2.5 quintillion bytes a day.

As the world gradually becomes more connected and as we start interacting more with “smart things” that number is going to grow exponentially (over the years to come). These oceans of data carry a wealth of valuable information that can be leveraged to make intelligent business decisions.

Any data that carries sensitive Personally Identifiable Information (PII) needs to be protected. However, many large companies (particularly non-technology companies) tend to perceive data security as something separate from their core business.

This means that they didn’t look at their customer data as a tangible asset (like inventory or real estate). This scenario forced government bodies to step in and take numerous regulatory measures at a regional, national, and international level to protect customer data and ensure accountability.

Some of these include the following:

  • California Consumer Privacy Act (CCPA)
  • China Cybersecurity Law
  • General Data Protection Regulation (GDPR)
  • Health Insurance Portability and Accountability Act (HIPAA)
  • New York Cyber Law
  • Payment Card Industry Data Security Standard (PCI)

In this article, we’re going to focus on both GDPR and CCPA. The former came into effect on May 25, 2018 (so you should have already taken steps to be compliant) while the latter will go into effect on January 1st, 2020.

What is GDPR?

GDPR aims to protect citizens or residents of the European Union (EU) from privacy and data breaches. So if you have an electronic presence in the EU, the data you work with will be strictly governed by this regulation.

Who Does It Apply To?

GDPR applies to businesses that offer goods and services to EU citizens or residents. It also applies to companies that process personal data of EU citizens from inside or outside the EU. Even if the physical location of the data center is on another continent, the rules still apply.

GDPR demands the following:

  • Consent agreements for data processing must be explained in layman’s terms
  • EU citizens affected by a data breach must be notified within 72 hours of detection
  • Consumers have the right to know if their personal data is being sold, who it’s sold to, and why
  • EU citizens also have the right to be forgotten and erased completely from enterprise databases

The fines for violating GDPR are very high. It can be as much as €20 million or 4% of total global revenue (whichever is higher). The data subjects who have been exposed to a breach also have the right to seek compensation.

What is CCPA?

The CCPA applies to companies across the planet that collect personal data of California residents. It also applies if the business, a parent company, or a subsidiary generates $25 million (or more) annually.

Enterprises that collect data on 50,000 residents (or more) should also maintain compliance. Businesses that generate more than half of their revenue from the sale of personal data of California residents also fall under the law.

Even if you’re not based in the region, if you’re selling products or services in the Golden State or collecting personally identifiable information of its residents, then CCPA will apply to you.

CCPA demands the following:

  • Companies must notify the individual whenever their personal information is collected
  • Organizations must inform consumers about how their PII is used and who it’s sold to
  • Consumers should also be presented with the option of opting-out via a “Do Not Sell My Information” link on enterprise homepages
  • Californians also have the right to be forgotten and erased completely from enterprise databases and the databases of any third-parties who purchased the information
  • Even if consumers chose not to share their information, they would have the right to equal treatment from businesses
  • Victims of a data breach can also file a class action lawsuit

Those who violate the law will be fined up to $7,500 per record. When you consider the sheer amount of data that’s collected by enterprises today, that number can add up to the millions very quickly.

Key Comparisons of GDPR and CCPA Requirements



ETL for Enhanced Data Protection

When the stakes are this high, companies need a robust solution to protect themselves from a potential data breach. In this scenario, Extract, Transform, Load (ETL), when used appropriately, can help ensure compliance with multiple regulatory bodies.

What’s ETL?

ETL can be described as the process of loading, extracting, enriching, and transferring data from one or more sources into a destination system (like a data warehouse) where the data will be represented differently (from the source). It’s a process that consistently enforces data quality, security, and authorized access to enable Business Intelligence (BI).

Related Reading: ETL vs ELT

How Does It Apply to GDPR and CCPA?

Sensitive data is usually a small subset in your data lake (or data warehouse) and can be found anywhere (including the most unlikely locations). It can also be on-premise, on the cloud, or sitting in unstructured repositories like Hadoop.

When you’re dealing with large volumes of data that’s spread across multiple locations, it’s going to be extremely time-consuming and error-prone to take a manual approach (that can lead to a potential data breach). Furthermore, a manual approach won’t be scalable or repeatable.

The best way forward is to leverage an automated detection and encryption solution. ETL is a continuous and ongoing process that can rapidly extract data (from homogeneous or heterogeneous data sources), cleanse it, enrich it, identify sensitive information, encrypt it, and load it into a data lake or data warehouse.

How Does It Work?

Data protection starts with the detection of sensitive information across the organization, on-premise, and in the cloud. Once it’s identified, PII like customer names, credit card numbers, email addresses, IP addresses, social security numbers, zip codes, and more can be masked or encrypted.

Encrypting all the data is a high-risk, time-consuming, and expensive endeavor. Instead, enterprises must develop and follow protocols that ensure the security and confidentiality of PII when it’s received, handled, shared, or transferred.

For example, with Xplenty’s rich set functions, you can hash PII, keep it unique, but mask the content to maintain compliance. You can also mask (and soon encrypt) data fields before it’s loaded into the data warehouse.

Hashing is a one-way process that transforms meaningful information into a random number in a reproducible but un-reversible way. This is ideal for PII like names and social security numbers that are shared across various datasets or tables. In this scenario, as you can’t reverse it, but you can take the original data and hash it again to ensure that it’s identical.

When you want to retain the original field data, encrypting the data before it’s loaded into the data warehouse is the better approach.

As PII is rarely needed for data analytics, keeping this information encrypted consistently will reduce your exposure to risk.

Going forward, maintaining regulatory compliance will be critical to business continuity. As a result, ETL for GDPR, CCPA, is becoming even more vital to business security and privacy compliance.

Want to learn more about ETL? Find out how your business can benefit from Xplenty’s simplified data integration service.