Why You Should ETL Your Salesforce Data

Your organization’s Salesforce instance is built on one of the most powerful and easily configured databases in the business. So it might make sense to use your Salesforce database directly for data integration - after all, it’s a real-time copy of your data, in a powerful cloud system. In this article, we’ll explain why something that makes sense in theory might not be the practice that your organization wants to adopt.

Salesforce is a CRM, Not a Data Warehouse

Salesforce claims they are the world's #1 customer relationship management (CRM) platform. What does this mean? Let’s start by looking at what Salesforce is trying to accomplish: they want to give users an always available, high performance system that allows them to leverage data in an interactive application. In a cloud environment, where many organizations are sharing resources, ensuring good performance while analysts are spinning up resource-intensive queries during work hours is a major technical challenge. Salesforce’s CRM solution is to limit resource access via governor limits which restrict almost any type of programmatic and query activity in Salesforce. These limits can easily be reached by the kinds of queries used in data warehouse applications.

Of course, you can choose to upgrade your Salesforce instance to increase those limits, or buy Salesforce’s own analytical tools, if you’re just looking to query only your Salesforce data. Most organizations are also interested in integrating data from other systems with Salesforce, and a data warehouse is the place to do that. Data warehouses, on premises, and cloud-based etl tools, are optimized and priced for queries rather than interactive workloads, so you’re likely to want to push your data to a data warehouse to avoid blowing out your governor limits.

For more information on our native Salesforce connector, visit our Integration page.

Extracting Salesforce Data - Challenges and Tools

The good news about extracting data from Salesforce is that there are lots of options available for data warehousing. Your organization may already have a standard, but if not, cloud solutions like Amazon RedShift, Google BigQuery and Snowflake provide databases optimized for query loads. Depending on the size of your instance, tools like Amazon RDS and Microsoft Azure SQL Database that use traditional relational databases may also be an option. There are also dozens of analytical tools able to query data from either a data warehousing or relational cloud database.

The not-so-good news is that the availability of great databases and analytical tools doesn’t make Salesforce ETL challenges magically go away. When you extract data from Salesforce to another database, no matter which one, you’re maintaining two separate databases. You’ll need, at a minimum, to answer the following ETL questions:

What data do I need to extract from Salesforce?
How often do I need that data?
When a change happens in Salesforce, what’s the process that I need to use to make sure that change is evaluated and replicated to the data warehouse, if appropriate?
How do I perform ETL in a way that doesn’t use inordinate amounts of Salesforce resources?

Salesforce ETL Decisions

You need to choose what Salesforce data you want to extract, how often you need the data extracted, then how you are going to govern that data. In our opinion, questions 1 through 3 above need to be answered before you look at 4, so let’s examine some of the issues you’ll want to consider when answering those questions.

Choosing Salesforce Data

A Salesforce Sales Cloud instance has dozens of standard objects. You can take an inclusive approach and just extract everything, but this approach has a high maintenance cost, since you’ll need to perform change control on all the objects in Salesforce, even ones that no analyst may ever want to query. If you choose just to extract a few objects, then you’ve minimized your change control on those objects, but you’re taking a risk that the data warehouse won’t be useful, or that you’ll be constantly adding objects to extract. Only you know the right answer for your organization, and you may feel a bit like Goldilocks as you try different approaches until you get one that’s just right.

Frequency

Once you decide which data you want to extract, you’ll need to figure out how often you want to extract it. There may be more than one answer to this question, since different audiences might require different extraction frequencies. For example, there may be 10 objects that are needed for queries that your analysts run daily, and another 10 additional objects that are used in weekly or monthly management reporting. You may want to have two databases within your data warehouse, one for the first 10, and one that has the first 10 plus the additional 10.

In addition to the data that your analysts want to access, other departments in your organization may want to join their data with Salesforce data as part of their data analysis and reporting. Your data selection process should include working with other departments, but you may want to get your own data extract up and running first as a proof of concept of your chosen ETL tool and database.

Change Control

Change control should be part of your data governance process. Because Salesforce’s ease of adding custom fields and objects can lead to a “kid in the candy store” rush to add data elements. A well-run Salesforce org must have a review process that is followed carefully to avoid data chaos. This process should include a step that determines whether the user community requires a new field or object in the data warehouse. (We have a whole article on best practices for data migration that’s worth a read before you start any migration process.)

Once you’ve decided what you want to extract, how often you want it extracted, and how you’ll track changes between the Salesforce source database and your target data warehouse, it’s time to look for an ETL tool.

Picking the Right Salesforce ETL Tool

Choosing a good tool for your Salesforce ETL raises technical and usability issues.

From a technical perspective, you want a tool that isn’t going to blow out your governor limits, and is stable and efficient for data integration.
From a usability perspective, your tool choice must be one that works with your change control strategy. If changing your data extract is a time-consuming chore, your data warehouse schema is going to fall behind your Salesforce schema.

The key technology that your ETL(extract, transform, load) tool must use for efficient Salesforce extraction is the Salesforce Bulk API. Unlike an interactive query in the Salesforce API, a bulk API query is submitted to a queue and run as a batch. So, Salesforce is able to better optimize resource usage with Bulk API calls, and larger batches of data can be retrieved with fewer API calls, thus keeping your extracts under governor limits.

Of course, even a tool that can pull data from Salesforce using the most efficient API available still has to push data into your data warehouse. Your chosen ETL tool must support your target database, efficiently. It must also do all of the things that you expect from a good ETL tool in a non-Salesforce context, including the ability to log issues and send alerts if an extract has an issue, as well as exhibiting stability and good performance with its data pipelines.

As for usability, your ETL tool should ideally use some kind of non-code interface to extract and transform data. It’s much easier to look at a graphical representation of the data extraction pipeline and to make changes with clicks instead of code. This is especially important in the Salesforce context, since Salesforce admins are used to modifying Salesforce by selecting items from a menu, or creating a flowchart-like workflow. Tools like Amazon Glue are great in some contexts, but Salesforce admins will probably balk at modifying Python or Scala code to add another field or object to the extract.

Your ETL tool should also be easy to install and configure. There are still on-premise ETL tools, but if you’ve chosen a cloud database as a target for your Salesforce cloud application, it seems like a step backwards to undertake an on-prem ETL installation task when cloud-based tools don’t make you jump through installation hurdles.

Integrate.io for Salesforce ETL

It shouldn’t surprise you to find out that we think that Integrate.io fits the bill as a good ETL (extract, transform, load) tool for the purpose of pushing data from Salesforce into a data warehouse. But, if you accept our contention that Salesforce data is best accessed in your data warehouse, then we think Integrate.io is one of the best tools for the job:

Our solution supports the Salesforce Bulk API natively, so your extracts will use the least Salesforce resources possible.
We are one of the only ETL tools that have allows you to extract data from Salesforce and load data back into Salesforce.
We support all the major data sources, data warehouses, and relational databases.
Our drag-and-drop data pipeline builder is a clicks-not-code way to build extracts that are quick to create and easy to modify.
If you don’t have staff resources to learn our tool, our team of experts can build your integration for you. We can also maintain it if your staff is busy with other priorities.

If you are ready to ETL your Salesforce data to a data warehouse, or even if you just want to talk about your options, we’re happy to provide a demo, a seven-day free trial, and a free setup session with our implementation team. Just contact us and experience the Integrate.io platform for yourself.

ETL

Why You Should ETL Your Salesforce Data

Table of Contents

Salesforce is a CRM, Not a Data Warehouse

Extracting Salesforce Data - Challenges and Tools