Offload Redshift ETL to Xplenty

Offload Redshift ETL to Xplenty

Xplenty can read data from SQL Server, MongoDB, SAP HANA, and many more data stores. One of the many DBs Xplenty integrates with is Redshift.

Redshift is Amazon’s data warehouse-as-a-service, a scalable columnar DB based on PostgreSQL. Xplenty can be used to offload ETL from Redshift, saving resources and allowing each platform to do what it does best: Xplenty for batch processing and Redshift for real-time querying.

Loading Data from Redshift

Data can be loaded from Redshift and processed by Xplenty by creating a Redshift connection and using this connection in a database source component. Note that Xplenty reads data directly from Redshift just like all other DBs, so the reading part isn’t currently offloaded while the processing is offloaded.

To integrate Redshift with Xplenty, first get the relevant details about your Redshift cluster:

  1. Login to AWS and go to the Redshift page
  2. Go to the “Clusters” page and select the relevant cluster
  3. Keep a note of the endpoint, port, database name, username, and password (the password is not available on this page) Redshift settings

Back in Xplenty, create the Redshift connection:

  1. Login to Xplenty or sign up for free
  2. Go to the “Manage connections” page
  3. Click “new database connection” and select “Redshift”
    Redshift settings

  4. Enter all the relevant connection details
    amazon redshift connection

  5. Click “test connection” to make sure that it works and then click “create postgresql connection”

Now you can create a dataflow that processes the data:

  1. Go to the package editor by clicking the “my packages” menu
  2. Open an existing package or create a new one
  3. Click “new source” at the top and select “database”
  4. Choose the database connection that you have just created and fill in all the relevant fields—please see the documentation for full details
    database source

  5. Click OK and continue editing your package

Storing Results in Redshift

Xplenty can also store results back to Redshift once the batch processing is complete. The data is stored via S3, which is the fastest way to load data into Redshift. You can use Redshift’s workload management to make sure that storing the data has a minimal effect on queries.

  1. Add a new connection for the Redshift output, as described in the previous section
  2. Open the relevant package and add an “Amazon Redshift” destination by clicking the plus sign below the last component
  3. Choose the connection that you created and fill in all the relevant fields—please see the documentation for full details amazon redshift destination

Summary

You can easily offload your ETL from Redshift by integrating it with Xplenty. Xplenty can load data from Redshift, process it, and then store the results back to Redshift. This way you can keep your ETL and data warehouse at maximum warp speed and save plenty of time and money.


Integrate Your Data Today!

Try Xplenty free for 7 days. No credit card required.