In this video, Tamás Srancsik, Data Analyst at Bitrise, discusses his experience with connecting Segment with Salesforce. He describes in detail why Segment’s Salesforce destination was not suitable for Bitrise (and may not be for your business) and shares how they created a better data flow using Xplenty. 

This talk will be instructive for Salesforce developers who are responsible for integrating large volumes of customer data with Salesforce.

What You Will Learn

  • What is Bitrise? [00:00:46]
  • What was the business need? [00:02:08]
  • What is Segment? [00:03:04]
  • Mapping Data Sources and Destinations on the Segment Dashboard [00:03:37]
  • Why not Segment’s Salesforce destination? [00:05:48]
  • Current Final Data Flow using Xplenty [00:09:10]
  • The evolution of Xplenty packages [00:10:10]
  • Limitation of using Xplenty [00:13:53]
  • Streaming from Segment to GCP [00:15:55]
  • Conclusion [00:19:13]
  • Q&A [00:24:04]

Full Transcript

[00:00:00] Hello, and welcome to another Xforce data summit presentation. Today, we have a data presentation, living up to our name, and it's by Tamas, who's a data analyst at Bitrise, and he's talking about combining Segment data with Salesforce. He tells me that this is a case study and that he'll go through the good, bad, and maybe the ugly of what happened when he tried to do that. So here's Tamas.

[00:00:43] Hi there. Nice to meet you. And, this is, as Leonard mentioned, this will be a case study, about how we implemented integration of Segment data to Salesforce. And basically it's about how we send the event information and what we've learned from this project. So what Bitrise is first, if you are in a mobile domain, that maybe Bitrise rings a bell, but anyway, what you should know is it's a continuous integration and delivery solution specifically for mobile platforms.

[00:01:30] What makes Bitrise unique is the extensive software for mobile platforms and mobile development tools. You can build your project on iOS, Android, Samarin, Flutter, React Native. You can add several integrations, doing multiple-test testing, building code signing, deployments. 

[00:01:54] Basically, you link, let Bitrise access your codebase on GitHub or Bitbucket, and then events on these platforms triggers BL trans-specific text and code on Bitrise.

[00:02:08] End of last year we started to scale up our sales team and there was a huge project came up that we should have only one platform for the salespeople and this should be Salesforce as soon as possible.

[00:02:28] So we need this basic information, like email, first and last name, company name, and some Bitrise-specific custom attributes - number of build events, last repository added, organization owners, and some 47 others. Currently, but we are still counting. 

[00:02:49] It's an ongoing project, there are new ideas coming and we recognize that there’s some other information that’s needed like subscription-related information from Stripe that report locally.

[00:03:10] We have event tracking already implemented in Segment. If you are not familiar with this tool but you've got a web application or mobile application you should check it out. It's a customer data infrastructure service provider. You can track your events, you can easily integrate specific services, and you can enrich your data by a bit as well.

[00:03:34] So, this is how the interface looks like in Segment. It's super simple. You've got your data sources and then you map them to your destinations. For example, here, we've got Javascript libraries and backend Ruby libraries sending information to PostgresSQL, BigQuery or using this information for segmenting users in Google Analytics or Amplitude and enriching information in support systems like Zendesk or Intercom. And what you can’t see here is Salesforce.

[00:04:16] So, Segment actually provides a solution, a kind of destination for Salesforce leads. So you can load any kind of data to Salesforce from your services.

[00:04:34] And when this need from sales arose, we had two options that we investigated. One is a kind of a development, but actually it was not on the roadmap of the developers or engineering teams. And what you can do is that you are implementing this integration in your code or set up service in your web application that sends information. At first, it was only specifically focused on leads, that we should generate leads from the user data.

[00:05:18] For example, somebody signs up that we should have this information in Salesforce. We should have additional information coming from the website that whether they have started to use our product or not, did they enter the trial or something like that. And usually, these implementations happen with reasonable engineering resources, that you set up kind of web service for that and stream this data to some of the Salesforce APIs.

[00:05:46] But we had already had Segment and we had this Salesforce integration assignment. But as you could see in my former chart, this is not the solution that we opted for since it's not as straightforward as it looks like. There are multiple other codes like page views or identifying the group the user belongs to.

[00:06:20] But let's focus on two kinds of calls from the Segment SDK. One is the so-called track call. So you can imagine it's a method which tracks the events happening, mostly on the backend. 

[00:06:40] The website backend is the most important source, not only for Segment but any other information because Bitrise is all about integration. So you link your codebase on GitHub then when there’s a new pull request it can trigger specific events on Bitrise. And then that's why we are concentrating mostly on the backend events. 

[00:07:05] For example, here somebody finished a build on Bitrise and then there are other properties that we are sending, like how many concurrences they had, what was the ID of the build, what was the repository’s ID, who was the owner of the repository, was it a Github repository or BitBucket, and in this specific example it’s BitBucket. 

[00:07:32] And the other code is the identify, which, as it says, is tying the users to the event. This is the method that can be loaded as a lead from Segment to Salesforce out-of-the-box. 

[00:07:50] Unfortunately, since we've got more than a hundred kinds of backend events and several others from the front end, it's simply too much, too many requests. There's no such API that can handle millions of concurrent API calls at the same time.

[00:08:14] And also there could be other data sources that we should link or enrich with these kinds of identify codes. For example, subscription data, that we should know what is the customer ID in Stripe. The user had an invoice, what was the total amount of that invoice? And things like that. 

[00:08:40] So the original idea was not to use engineering resources, but to have the data analyst - and this gets me - just link Segment to Salesforce. But as I mentioned, this was not the way forward and that's why we added Xplenty or ETL service to the package. 

[00:09:07] The current final data flow looks something like what I'm presenting here. That there are information coming from Bitrise to Segment. These are the events, the user information. Segment also collects information from Stripe or Subscription Manager. The ones that I mentioned formerly, invoices, customer IDs, and so on. 

[00:09:34] These are all put to Google BigQuery and Xplenty is combining these two data sources. One from Segment to BigQuery, and the other information from Bitrise, and it is Xplenty loading this data to Salesforce.

[00:09:55] This lets us not only load leads, but also to concentrate on converted objects, accounts, and contacts as well when updating these leads. So leads get to be implemented in two packages on Xplenty. One is inserting new leads from recent registrations, the second one is updating the existing leads contacts and accounts. 

[00:10:28] Both are scheduled daily and what Xplenty is good at is batch processing. So it's using the bulk API off of Salesforce so you can throw multiple records at the same time at Salesforce and, again, we thought that this will be a solution. This will be simple. We are just writing some SQL code and then the data will flow. 

[00:11:03] But you should know that in Bitrise there are two kinds of entities. You can have users that you’ve registered on the application, but you can belong to multiple organizations as well. But the organizations behave some kind of similarly to users in that they can own repositories, they can have builds, so this is much more complex. 

[00:11:29] So the first iteration we tried was having different sources in Xplenty and linking them in this kind of Octopus way, that was a monster series of joints and components. It was super hard to debug - the complexity of the package exponentially grew when we added new data sources. 

[00:12:03 ]It’s not the fault of Xplenty, it’s the fault of how we scope the project and how we misunderstood the scope. That first, it was just only the leads. It was only a couple of information, first name, last name, email, and so on. Then we recognized that we've got other data sources, there was new ideas coming from newly hired entities and so on, and that's ended up 50 different attributes and several different sources. 

[00:12:34] So we have to refactor all these packages and basically ee use these kind of joints or logic of data integration, it’s all written in SQL. This is what data people thought, this is simple, this is easy, you can easily debug, you can hand it over to other analysts.

[00:12:58] Two years ago, when we started to use Xplenty it was a very important. aspect or feature that we were looking for or at that point of time, that it should support SQL out of the box. And because of these cases, that simply this is the language all data people speak.

[00:13:23] This is simply efficient and you can iterate over. So we can easily add new attributes or remove them. For example, when we are recognizing that the update package shouldn't override existing names or email addresses, because there could be other sources in Salesforce enriching this information.

[00:13:48] But also there are limitations of the solution that we go for. As I mentioned, Xplenty is very good at batch processing but not supporting data streaming. There could be specific cases where you'd need real or almost real-time data. 

[00:14:10] And this is not our case, this is not part of the scope of this project, but there could be specific product qualified leads, for example, that we see existing paying user is investigating enterprise packages and we should already notify or immediately notify the sales representative that can reach out in that specific region, and this is something that cannot be done with the current setup. 

[00:14:52] They can have all the new leads, all the new registered users, all the latest information updated on existing leads and contacts and accounts each day. But we cannot trigger assignments. 

[00:15:06] For example, we know, for example, from the billing address that this specific organization is in Europe or US, or any other part of the world, and we should assign this lead to the sales representative of that region, but unfortunately it can’t do that because the bulk API since it's processing several similar records at the same time, it can’t investigate them one by one, that what is the specific property of the billing address to assign that specifically to the corresponding sales rep’s identity.

[00:15:55] What you can see here now is a kind of solution with - I have to admit that we haven't implemented it yet, it's just under investigation - how we can stream specific events to Salesforce or trigger specific tasks based on the events of what Segment collects. 

[00:16:19] So still Segment is collecting information from Stripe, specific events on the backend and frontend, and other sources. And then what segment can do, for example, in this case, we are using TCP, but I suppose that you can set up similar solutions on Azure or Amazon Web Services as well. 

[00:16:45] The segment can trigger pops on, sending a message to Google Cloud functions, and this function can process this information, look up different additional attributes in Google BigQuery or PostgresSQL, or other sources, and then storing the streamed information in BigQuery and also send it to Salesforce immediately.

[00:17:19] So this would speed up the whole data processing. On the other hand, I wouldn't recommend something that, for example, today, each night we are processing or updating 100,000 users, more than 100,000 users. This is not something for that. 

[00:17:42] This is for, for example, I've said product clarify lead, this can be a solution for specific events when you have, you need, an immediate sales response and, but it's up to the business - what can or should trigger these? 

[00:18:10] Here what you can see is how simple it would be. It's code in Python. This is a basic function sending information in two ways. One is a simple API creating the lead, one by one, and the other one that you can have multiple ones, you can call the bulk API as well.

[00:18:31] So you can just add the simple Salesforce or other wrapper to your GCP cloud function and it pops up message that triggers it. For example, here we are sending the email and this email will trigger or having this event pops up, the email will trigger the lead creation. But you can also assign tasks for specific sales representatives, or you can trigger assignment rules if you are not using the bulk KPI. [00:19:13] As I mentioned, we made some mistakes in this process, and we've learned what we should learn the hard way. It’s good to say that we succeeded in creating quite complex data solutions with Segment and Xplenty without reasonable engineering resources. And so you can really easily not only prototype but put it into practice as well.

[00:19:56] But if you can do something or see something is feasible, it doesn't mean that it's good for you, or you should do that. And there's no more reasons behind it, it's just simply against your own interests. You can link Segment directly to Salesforce, but you shouldn't do, or I wouldn’t recommend that.

[00:20:19] Or I can only recommend that in this case, when you get a limited number of events coming through. But as I mentioned that we had hundreds of events on the backend and hundreds of thousands of users. This is something that the Salesforce API does not support. 

[00:20:40] You should do your homework. You should do your research. That this is something that we did really late. It was, it's a small scope that we should populate leads, there should be a couple of attributes and then it's just grow and grow and grow with each iteration the scope would grow, and then we need to add Stripe, then we need to add trial information and so on.

[00:21:07] So, and that lead to multiple refractoration of the code and that solution and expanded package. So, it would be better, and you should, you should do the research. You should understand what is feasible with specific tools and what is not feasible? [00:21:31] What is the business requirement? That should be very, very specific and it shouldn't be on the technical level, it should be what salespeople need on Salesforce. And when you understand that, then you can tailor your solution towards that and not worse, vice versa - that we’ve got this kind of two sets so we are putting something together and it will be super good, I hope it will serve that, and unfortunately, this is not the case. 

[00:22:06] So Xplenty can save a lot of engineering hours, but it’s not able to make miracles. Still, you have to write code, you have to maintain code. You should think it over. You should follow specific engineering practices. Most of these services market themselves as a drag and drop interface, you click it through and magic will happen.

[00:22:38] But unfortunately this is not the case, or this is just in specific cases, but when you've got a complex business and as your team grows, as your organization grows, as your business gets more complex, it can’t be your solution. We have to have custom business logic implemented in code in this case, but in this case, it puts SQL and it’s good at expanding to support that.

[00:23:15] The same goes with Segment, it's very good that it can integrate specific services, it can sync back the data that you’ve corrected, you can send it to data warehouses of any kind, Redshift, Google BigQuery PostgreSQL, and so on. It can send this information, the user data, out of the box to Intercom and Zendesk, which works very well.

[00:23:42] But not for Salesforce because it has a pretty complex API, it has multiple endpoints, and it has this kind of mapping of identify calls to Salesforce leads. Working out of the box it's pretty limited. 

[00:24:04] Thanks, Tamas. That was, that was an interesting overview. You seem to have a very complex data ecosystem there at Bitrise, especially capturing all those build events in your tool and pulling it into Segment and, and basically basing, it looks like you base sales decisions on that.

[00:24:28] So, were you there the person who implemented all this? How big was the team that implemented your Xplenty, project? 

[00:24:39] I should admit that the old team is super understaffed. It's super understaffed relatively, not only to these projects, but there are other business questions coming in, you should get your reports in Amplitude, Metabase, and other BI solution.

[00:25:09] Basically I was working on this solution alone and the team is just only two people, we are currently hiring, and there are super experienced people in the funnel so hopefully, this will change in the future.

[00:25:29] But that's why we opted for this solution. You can see that this ecosystem contains multiple services and this is, this was always the approach in choosing or opting for any solution, that we shouldn't do something because it's fancy. For example, in data engineering, at some point of time, it was Apache Spark and now it's Airflow.

[00:26:00] So you can hire multiple data engineers, you can set up different workflows and pipelines and so on, but you shouldn't do that. And I mentioned that if something that you can do it, it doesn't mean that you should do that. If there is a simpler solution, cheaper solution, and especially cheaper in terms of time, then you should go for that.

[00:26:27] But at some point of time, as the business gets more and more complex, you may need to consider having a specific person painting in these solutions. 

[00:26:40] And so you learned the Xplenty tool on your own, or did our support people help you build your pipelines? Or how did that happen?

[00:26:52] Mostly by myself. You know, it’s like learning languages, like, after the third or fourth one, - I can't speak so many languages - but it’s that after the third or fourth it’s much easier to pick up a new one. And so it was this way, I suppose, with the data tools as well, you see several ones and then you can easily pick up we did a similar one and then you can easily pick up. Not to mention that Xplenty is super easy to use clearly and we had a pretty good experience with the support people as well.

[00:27:28] So whenever we start, you have immediate support too. 

[00:27:36] And the, it sort of looks like that little piece of Python you show, that's going to do kind of a, you know, data streaming or, I don't know, I guess for want of a better word like it's like a trigger, it's like a trigger process, right? Certain business things happen. Are you putting that together too? Or are your software engineers doing that? How is that being built? 

[00:28:00] No, me again. But you know, the Python, and this is why it's good that these cloud functions usually support Python, it’s the other language besides SQL that not only data engineers but Peter and Eric speaks as well. This is the technical aspect, but where I think that the real responsibility of data analysts and business analysts, is to scope out the business question, and then it's up to you. But there's so many tools, so many solutions equally good, or it can fit your organization, it can fit your team’s skillset.

[00:28:47] if you know Python then you can go for a tool that speaks Python. But what is really important is to understand the business. So how do we have good communication between the different domains, the different teams, to understand what the sales needed, in this case Salesforce, but the same goes for marketing and the other teams as well.

[00:29:14] So I think that this would be the harder part, or this would be more important than this project having a tool. What I presented here shows how easy it is that you can be a one-man job to put it together. But actually there's teamwork behind the setup, the scope. To have something there that really supports your colleagues, not just having data because we are a data-driven company. It’s not that case. It should be specific data in a specific format, at the proper time as well.

[00:30:03]. Yup. That makes sense. Some people say, you need to have,, not only you have to have the knowhow, but you have to have the know-whether, you know, whether to do something now, whether it's worth doing.

[00:30:16] Thanks. Thank you so much. Thanks again for the presentation. Thanks for sharing your experience with Xplenty, with Salesforce, with Degment and Bitrise. Appreciate your time. Thank you.

About Xforce

The Xforce Data Summit is a virtual event that features companies and experts from around the world sharing their knowledge and best practices surrounding Salesforce data and integrations. Learn more at www.xforcesummit.com.