There are a few standard structured data formats and discussions galore on which of them is more advantageous. Within Xplenty, users are able to process JSON and XML data formats with ease, and this article shares an example showing the functions that facilitate processing XML on Xplenty. 

Customer Story
Customer Story
Keith connected multiple data sources with Amazon Redshift to transform, organize and analyze their customer data.
MongoDB MongoDB
Amazon Redshift Amazon Redshift
David Schuman
Dave Schuman
CTO and Co-Founder at Raise.me
They really have provided an interface to this world of data transformation that works. It’s intuitive, it’s easy to deal with [...] and when it gets a little too confusing for us, [Xplenty’s customer support team] will work for an entire day sometimes on just trying to help us solve our problem, and they never give up until it’s solved.
TRUSTED BY COMPANIES WORLDWIDE

Enjoying This Article?

Receive great content weekly with the Xplenty Newsletter!

Table of Contents:

  1. Overview and Resources
  2. Setting Up the Xplenty Data Pipeline
  3. Summary

Overview and Resources

For a demonstration, here is the link for the sample XML file we will be processing https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms762271(v=vs.85)

The file shows XML structure as in the image below:

XML-source.png

The Xplenty functions XPath and XPathToBag are key to the processing of this data. Let's examine these with a data pipeline.

Setting up the Xplenty Data Pipeline

XML-processing-pipeline.png

The following list explains the different components of the Xplenty pipeline in the order:

1. XML_Source: The XML file from the link shared above is copied onto a cloud storage location and read using the File Storage Source Component

2. XPathToBag: This step calls the XPathToBag function to match the XPath '/catalog/book'. This fetches all the books under <catalog> </catalog> in a Bag datatype. For example, XPathToBag(data,'/catalog/book')

3. Flatten_Books: Uses the Flatten() function to get the books as individual records each record of the structure as 

            Part-of-XML-source.png

4. XPath: In this step using the XPath function, the individual elements of the book structure can be retrieved. Here is a peek into the component with the XPath set up for the above <book> </book> structure

        XML-processing-XPath.png

For additional reference on XPath and examples, refer to an XPath evaluator such as freeformatter.com

5. Destination: The individual fields processed from the XML are stored in a destination, in this example, it is a BigQuery table.

The following image depicts some example records from the output:

XML-processing-destination.png

Parsing the XML from a file or an API response into a tabular structure would be key for having data lookup, and blending with other datasets could facilitate further data analysis.

Customer Story
Customer Story
Keith connected multiple data sources with Amazon Redshift to transform, organize and analyze their customer data.
Amazon Redshift Amazon Redshift
David Schuman
Keith Slater
Senior Developer at Creative Anvil
Before we started with Xplenty, we were trying to move data from many different data sources into Redshift. Xplenty has helped us do that quickly and easily. The best feature of the platform is having the ability to manipulate data as needed without the process being overly complex. Also, the support is great - they’re always responsive and willing to help.
TRUSTED BY COMPANIES WORLDWIDE

Enjoying This Article?

Receive great content weekly with the Xplenty Newsletter!

Summary

There are several enterprise systems that consume and output XML data, and as a trusted document-based information transfer, XML based files and APIs can come up often as use cases. Stop by and explore the functionality for processing the structured data formats on Xplenty. For more individualized instruction and information, contact us to book a risk-free demo.