Data Warehousing projects are challenging. Quite often the sheer amount of business requirements involved, and the data volumes that are attached to it, have made these types of projects notoriously risky and costly. Today, however, most organizations understand that deriving insights from data is essential, and to a certain extent, the majority of them are executing various types of data warehousing projects.

A key challenge of such projects is that of integration from disparate data sources. The need to collect, transform and cleanse the data, so it matches pre-existing internal business entities and terminologies, is one of the most laborious phases in a data warehousing project gantt.

Organizations have been using Hadoop as a massively scalable ETL engine for quite some time now. However, the challenges of data integration still remain even when Hadoop is the ETL engine of choice, as opposed to a more traditional ETL engine such as Informatica or Datastage. In fact, one can assume that the data integration challenges are bigger, especially if the ETL developer is someone who is not well-versed in Hadoop to begin with.

Xplenty can help the ETL-on-Hadoop developer a great deal. Our platform allows the user to access data that is stored on Amazon S3 storage. It’s not necessary to upload data to a Hadoop cluster, as Xplenty reads the data from your S3 account directly. An added perk is that there’s no performance penalty in doing so. Additionally,you can have Xplenty read data from any relational data store: MySQL and PostgreSQL among others.

Once the data has been processed, the data destination component allows users to store the results in numerous data stores:

Amazon S3: Just point Xplenty to the appropriate bucket and folder, and Xplenty will write the data set to that location.

Amazon Redshift: Amazon data warehouse as a service Redshift is a great place to store processed data for reporting and analytics. Xplenty can load the processed data directly into a Redshift instance so there is no need to deal with extra scripts and batch processes to move files from S3 to Redshift.

Relational Data stores: Xplenty can write the data to any relational data store: mySQL, PostgreSQL and the likes. It’s even possible to get the results to an on-premise data store.

SAP Hana: Yes, we even allow our users to place their data on SAP Hana instances on the cloud. SAP Hana is an in-memory analytic data store that lets users analyse data very quickly.

We are working hard to allow more sources of data to be consumed by Xplenty as well as allowing more data destinations.