Welcome to Xplenty's Blog.

Xplenty and Cascading

Xplenty and Cascading

As part of our recent acquisition of Driven, Xplenty is also taking charge of Cascading, the popular open source project that is used to create and execute complex data processing...

Xplenty announces new integrations for Google Cloud Platform: Cloud Spanner and Cloud SQL for PostgreSQL

Xplenty announces new integrations for Google Cloud Platform: Cloud Spanner and Cloud SQL for PostgreSQL

We at Xplenty are adding two new services from Google Cloud into our ever-increasing array of available integrations: Google Cloud Spanner and Google Cloud SQL for PostgreSQL. As a longtime...

Amazon Athena is making data lakes cool again (and Xplenty can help)

Amazon Athena is making data lakes cool again (and Xplenty can help)

One of the most long-overdue AWS functionalities is finally available. During the most recent Amazon Web Services Re:Invent gathering in Las Vegas, Amazon announced Athena: a service that lets you...

Using An ETL Platform VS Writing Your Own Code

Using An ETL Platform VS Writing Your Own Code

Writing your own ETL code is not trivial. What starts out as a simple ETL process gets more complex over time. So does the coding, which becomes less manageable. A...

5 Platforms for Collecting Big Data

5 Platforms for Collecting Big Data

Everything comes as a service these days, and so does collecting Big Data. Various platforms on the web are happy to take data collection off your coding hands, making it...

5 Real-time Streaming Platforms for Big Data

5 Real-time Streaming Platforms for Big Data

There are quite a few real-time platforms out there. A lot of them are newcomers, and the differences between them aren’t clear at all. The least we can do, is...

Amazon Redshift Review 2015

Amazon Redshift Review 2015

Happy birthday to Redshift! Amazon’s data warehouse-as-a-service has just celebrated two years of data querying. Several reviews were written about Redshift at the time, but as far as we know,...

Spark, Impala, Tez and Hive: Interview with David Gruzman

Spark, Impala, Tez and Hive: Interview with David Gruzman

Big Data consultant David Gruzman answered some of our burning questions about which Big Data platform to use, whether streaming is a must or not, and what are the biggest...

4 Ways to Process Small Data with Hadoop

4 Ways to Process Small Data with Hadoop

One of the greatest Big Data myths, is that you need terabytes or even petabytes of data before you can use Hadoop. However, there are plenty of advantages to using...

Spark vs. Tez: What's the Difference?

Spark vs. Tez: What's the Difference?

On paper, Spark and Tez have a lot in common: both possess in-memory capabilities, can run on top of Hadoop YARN and support all data types from any data sources....

Top 7 Hadoop Blogs for 2014

Top 7 Hadoop Blogs for 2014

People talk a lot about Hadoop, and we like to keep up to date with the latest gossip by reading Hadoop blogs. If you'd also like to jump into the...

Spark vs. Hadoop MapReduce

Spark vs. Hadoop MapReduce

Apache Spark is setting the world of Big Data on fire. With a promise of speeds up to 100 times faster than Hadoop MapReduce and comfortable APIs, some think this...

5 Hadoop Security Projects

5 Hadoop Security Projects

Following our post about Hadoop security for the enterprise, or the lack thereof, one of the ways to make Hadoop more secure is by installing an additional platform. Five major...

Become a Twitter Data Analyst with Xplenty

Become a Twitter Data Analyst with Xplenty

Let’s say that you’re doing some marketing for a Big Data startup. As part of your campaign, you want to find the most influential tweeters who talk about Hadoop and...

Process Data with Xplenty and Visualize it with Chart.io

Process Data with Xplenty and Visualize it with Chart.io

We concentrate on making data processing as fast and easy as possible. To complete the dataflow, Xplenty integrates with a plethora of services that can store, analyze, or visualize data....

GitHub, You Got Issues: An Analysis of Issues on GitHub in 2013

GitHub, You Got Issues: An Analysis of Issues on GitHub in 2013

Everybody has issues, and so do users and repositories on GitHub. That's why we decided to answer this year’s GitHub Data Challenge by heading where developers fear to tread and...

How to Integrate MongoDB with Relational Databases

How to Integrate MongoDB with Relational Databases

Integrating data from MongoDB and a relational database sounds like a major headache. On one hand you have a schemaless NoSQL database containing JSON objects, and on the other, an...

How to get Website Visitor Geolocations from IPs

How to get Website Visitor Geolocations from IPs

Although the Internet made the world flat, geography still matters. Knowing which countries your users live in could provide business opportunities to localize your services and increase profits. The only...

8 Data Integration Best Practices

8 Data Integration Best Practices

You’ve spent hours tinkering and preparing the perfect dataflow to batch process zillions of web logs. Feeling satisfied, you run the job on one of the clusters and leave your...

Using Regular Expressions in Big Data

Using Regular Expressions in Big Data

A regular expression, AKA regex, is a powerful yet really confusing tool. Although regular expressions are the technology behind text replacement and natural language processing, they are hard to read,...

Hive vs. HBase

Hive vs. HBase

Comparing Hive with HBase is like comparing Google with Facebook - although they compete over the same turf (our private information), they don’t provide the same functionality. But things can...

Data Cleansing Big Data: Scrubbing the Elephant

Data Cleansing Big Data: Scrubbing the Elephant

According to the Elephant Care Manual for Mahouts and Camp Managers: "It is essential to cleanse the elephant's body carefully every day by using half of a coconut shell to...

Hadoop Data Integration 101

Hadoop Data Integration 101

Last year Cloudera published a blog post on Big Data’s new use cases: transformation, active archive, and exploration. There’s one more use case that isn’t explicitly mentioned - data integration....

12 SQL-on-Hadoop Tools

12 SQL-on-Hadoop Tools

An overview of 12 open source and commercial SQL-on-Hadoop tools: Apache Hive, Apache Sqoop, Apache Phoenix, Impala, Presto, BigSQL, CitusDB, Hadapt, Jethro, Lingual, and HAWQ.

Hadoop-as-a-Service vs. On-Premise...FINISH HIM

Hadoop-as-a-Service vs. On-Premise...FINISH HIM

Mortal Kombat’s master of ice Sub-Zero and the living-dead fire breathing Scorpion are major archenemies. As the story goes, Sub-Zero and his clan of assassin ninjas slaughtered their rival clan,...

Storing Apache Hadoop Data on the Cloud - HDFS vs. S3

Storing Apache Hadoop Data on the Cloud - HDFS vs. S3

Ken and Ryu are both the best of friends and the greatest of rivals in the Street Fighter game series. When it comes to Hadoop data storage on the cloud...

Hadoop vs. Redshift

Hadoop vs. Redshift

Childhood dreams do come true - in 2015 "Batman vs. Superman" will bring the world’s biggest superheroes to battle on-screen, finally solving that eternal debate who will prevail (I put...

Data Sources and Destinations with Xplenty's Hadoop Platform

Data Sources and Destinations with Xplenty's Hadoop Platform

Data Warehousing projects are challenging. Quite often the sheer amount of business requirements involved, and the data volumes that are attached to it, have made these types of projects notoriously...