Integrate.io hosted the first Data Pipeline World Championship last month. With entrants and partners from 23 different countries, it was truly a global event.

Inspired by the Rube Goldberg Machine Contest, contestants had to create the most interesting data pipeline using the Integrate.io platform, the Data Pipeline World Championship’s partners’ APIs, and any publicly available databases. 

The Integrate.io community then voted on all submissions received from around the world to narrow it down to the finalists. A panel of data experts from industry-leading companies such as AWS, Slack, and NBC Universal then selected the winner.

Read on to see who was crowned the data pipeline champion of 2021 - the talent and creativity were top drawers! The winner of this year's competition scooped himself a nice $6,000 grand prize, with additional cash prizes and swag for the other finalists.

A huge, huge thanks to our partners, community, and participants - we had a blast partnering with you all. 

See you all next year for the 2022 Data Pipeline World Championship!

Table of Contents

 

Just Breathe

thumbnail image

thumbnail image

This ETL pipeline serves a larger purpose than just competing for the grand prize. It helps people visualize air pollution around the world. 

“It is my hope that this pipeline can raise awareness of air pollution and help scientists better understand the effects of long-term air pollution,” said developer Lloyd H.

The pipeline generates dummy profiles from DummyAPI and extracts valuable location info by feeding queries into multiple APIs like Ambee, OpenTopoData, and TimeZoneDB. Then local environmental factors such as nitrogen dioxide and air quality indexes feed into the web application Pixe. Ultimately, users can receive notifications about the weather via Twilio SendGrid

"Against the backdrop of increasing air pollution around the world, I wanted to create an application that helps people visualize air pollution and raise awareness of a seemingly invisible growing threat," said Lloyd.

List of APIs used

Video walkthrough

 

Read more: The Importance and Benefits of a Data Pipeline

Image Tagging 

thumbnail image 

What if you could organize metadata from an image repository on something like Google Photos and find your favorite images quicker? That's what one developer did by blending machine learning with SilverDiamond API, Imagga API, and Integrate.io. 

Developer Carlos Jaime found a list of random images from Picsum API, added metadata to each using the APIs Imagga and SilverDiamond, and added alternative descriptions for those images for additional context. Finally, Carlos parsed the data returned by the APIs and inserted it into the "tagged_images" table on BigQuery. 

"I've created a process that gives more value to image metadata by adding tags and generating a data asset that helps analysts and technicians decide which images to use based on tag searches," said Carlos. 

List of APIs used

  • Silver Diamond - Image alt detection
  • Silver Diamond - Image object recognition 
  • Picsum
  • Imagga

Video walkthrough

 Read more: Google BigQuery vs. Snowflake.

Crypto Data Warehouse 

thumbnail image

 This creative ETL pipeline aggregates cryptocurrency data to build a crypto data warehouse. Developer Sandeep M. retrieved datasets from Coinranking and ExchangeRate and ran them through Integrate.io and Google Cloud Storage

"The purpose of the pipeline is to be run once every 24 hours," says Sandeep. "Every day, it retrieves the latest Bitcoin price and other values. I used Google Cloud for everything."

List of APIs used

  • Coinranking API and dataset retrieved from it
  • ExchangeRate API and data retrieved from it
  • Integrate.io ETL Platform
  • Google Cloud Storage

Video walkthrough

Fish News Network (FNN)

thumbnail image

 Imagine a news network for salmon. Or tuna. That's the idea behind this insane data pipeline that pulls headlines from CNN and rungs a regex on specific terms for an imaginary audience of fish. You're now watching Fish News Network. Or FNN.  

Mix Max, the inventive developer behind this pipeline, replaced words in CNN headlines — "animal" to "amphibian," "labrador retriever" to "rainbow trout," etc. — then ran each sentence through the fake online RESTful API Dummy API, replaced the sentences, and swapped photos with pictures of fish: 

"What I ended up with was a simple pull from Google Sheets to transform and then a new table creation in Snowflake," said Mix Max.

List of APIs used

  • Dummy API

Video walkthrough

  • Video (talking starts at 3:55)

IP Geolocation/Azure Blob 

thumbnail image

Sunny J, who extracted information from the API IPGeolocation, decided to count the countries that used IPs on the interface and wrote a Parquet file to Azure Blob. 

This pipeline might look simple, but it’s more complex than you think. Watch the video walkthrough below. 

"I fetched the data from IPGeolocation and parsed it through the REST API stage in the framework," says Sunny.

List of APIs used

  • IPGeolocation API & Database

Video walkthrough

The Winners

1st place: Lloyd Hamilton, Just Breathe

2nd place: Carlos Jaime, Imaging Tagging

3rd place: Sandeep Mysore, Crypto Data Warehouse

Community favorite: Lloyd Hamilton, Just Breathe

 

Want to Build a Pipeline of Your Own? How Integrate.io Can Help 

Want to test out your data pipeline skills against the best in the world? Come join us for next year’s event! In the meantime, create advanced data pipelines and datasets with Integrate.io — the no-code data management solution that moves data from disparate sources, transforms it into usable formats, and loads it to a final destination for data analysis and business intelligence reporting. 

Integrate.io’s secure, point-and-click interface and easy-to-use pipeline builder allow you to build and manage data pipelines with no software development experience. So you can get creative without the complicated coding. Other Integrate.io features include a simple pricing structure, top-tier customer service, and pre-built connectors for improved data integration.   

Integrate.io has been crowned the planet's most creative data engineer. Now it's your turn to create a data pipeline with little or no code. Discover how Integrate.io can help data integration in your organization with a personalized demo today.