ETL vs ELT or Data Warehouse vs Data Lake
Which is best for your BI architecture?
There’s a flood of technologies and methodologies – each represented by its own buzzwords and terms - for handling the growing amount of data. One of the hottest, and still ongoing, data debates is the question of ETL vs ELT, and how they relate to Data Warehouses and Data Lakes.
To help bring order to this terminological chaos, we’ve created a new white paper, available for download here. In it, we lay out in clearly-understandable terms:
- What are the differences and similarities between ETL and ELT
- Which one is better?
- Deciphering the buzzwords: Data Lakes and Data Warehouses
ETL and ELT Defined
To make data lakes more accessible to traditional analytic tools, data can be organized by running it through an Extract, Transform, Load (ETL) process.
ETL is normally a continuous, ongoing process with a well-defined workflow. ETL first extracts data from homogeneous or heterogeneous data sources. Then, data is cleansed, enriched, transformed, and stored either back in the lake or in a data warehouse.
ELT (Extract, Load, Transform) is a variant of ETL wherein the extracted data is first loaded into the target system. Transformations are performed after the data is loaded into the data warehouse. ELT typically works well when the target system is powerful enough to handle transformations. Analytical databases like Amazon Redshift and Google BigQuery are often used in ELT pipelines because they are highly efficient in performing transformations.
The Big Questions
There are a number of other key questions that need to be asked, when considering data warehouses versus data lakes. Questions like:
- Do I first need to mask PII data (email & IP addresses) for privacy standards like GDPR, CCPA & HiPPA before loading into a lake or warehouse?
- What is the nature of my data? Do I need Real Time or is batch better? Structured or Unstructured?
- What About Data Volatility?
- Who are the people who need to query my data store, what are their skills? What are the types of queries that they will need to perform?
Finally, the bottom line: the data lake has nearly unlimited potential but requires transformations before achieving insights; a data warehouse requires significant investment in advance, yet in return delivers the ability to easily analyze everything, and the skills that are required in order to query it (usually SQL) are usually easier to find among your staff.
Which is best for you? We explain the considerations here.