Why you need a Data Integration Layer (ETL), reasons to use a SaaS based tool for ETL rather than coding it through scripts.
Welcome to Xplenty's Blog
All things data
In an interview with Datanami, Xplenty CEO Yaniv Mor explains the pitfalls of data hoarding and how to overcome it.
This post takes a look at how you can use Xplenty, a data integration tool, to create personalized data-driven marketing campaigns to increase your ROI.
So you’re on the cloud or plan to move there very soon.
If you want to know what the big players are doing with Big Data, then the Data Summit is the place to be. This Big Data conference, that will take place at the New York Hilton Midtown on May 11-13, will be attended by giants like Amazon, eBay, and even Pfizer and Mastercard. Dozens of engaging workshops will take place during the conference, so here are our picks top for the 10 sessions you can’t miss at Data Summit 2015.
What do U2’s lead singer Bono, pro skater Tony Hawk, and Netflix founder Reed Hastings have in common? Collision. Not a literal crash, but the huge tech conference that will take place in downtown Las Vegas May 5-6, 2015. Collision is related to Europe’s Web Summit and aims to bring all kinds of technology professionals together. Since there will be over 500 speakers at this year’s conference, you can’t go to everything. To help you get the most out of the conference, here are our favorite picks.
Last year, we recruited $3 million, got featured on TechCrunch, found new customers, hired more employees, attended conferences around the globe, spent thousands of hours on R&D, and invested a lot more effort on sales and marketing. Not everything was perfect, though. Now that 2014 is over and we have gained some perspective, here are six lessons that we learned as a Big Data startup.
Every year, dozens of Big Data conferences take place all over the world, from San Francisco to Shanghai. Now that 2015 is finally here, it’s time to open up your smartphone calendar and mark in this year’s Big Data conferences. Here are our five favorite events.
We want to know more about the Big Data community: what causes them headaches? What makes them happy? Which tools and technologies do they use? When we had our booth at AWS re:Invent 2014, we met as many people as possible and talked about their data needs. To get an even better picture, we conducted a little survey on the side.
Big Data is mostly famous, well, for being big. So if you have anything under a petabyte, why would you even think about using Apache Hadoop? But you should.
Google BigQuery is a great Big Data warehouse on the cloud for the SQL-savvy. But it’s not right for everything. Google itself recommends using Hadoop’s MapReduce rather than BigQuery for certain cases.
Screencast on how to connect Amazon S3 with Xplenty.
We’ve uploaded a brand new screencast which shows how to process web server logs with Xplenty.
What if you have a killer idea for GitHub’s Data Challenge but no money, servers, or coders at your disposal? We have the solution for you. You can sign up to Xplenty for free, process the data via our visual editor, and run it on a cluster, all without any installations or code. Let’s look at an example project to show you how it’s done.
Hadoop is definitely a great solution for processing dark data, but what if you don’t know how to use it? Hadoop requires you to buy new hardware, provide expert maintenance, and hire developers to program MapReduce jobs. Luckily, there is an alternative — Xplenty.
The 2014 World Cup is the hottest World Cup ever. Not just because of the soaring temperatures in Brazil that send players begging for water breaks, but also because of the high activity on social networks. Curious to take an in-depth look at what happens on Twitter during a game, we collected World Cup tweets during the Australia-Netherlands match.
In our previous post, we discussed Mad Men and how to design a data warehouse in the space age of Big Data. This post will take another step forward, or rather up, and examine how to design a data warehouse on the cloud.
In "The Monolith", the fourth episode in Mad Men’s final season, a huge computer is installed in the center of the floor. The computer was brought in because a competing ad agency has one, and being an innovation at the time, it was a competitive advantage that clients were looking for, and something the agency’s talented creative team could never do. Just as advertising went through major changes in Don Draper’s time, data warehousing is going through changes in our time.
Unstructured data is big - according to IDC, about 90 percent of the storage in the world is used for unstructured data. It comes as no surprise considering the amount of photos, videos, documents, and emails being generated on the web by the minute.
Amazon launched Elastic Map Reduce (EMR) to make Hadoop easier, but there were still too many Hadoop hoops to jump through before processing Big Data. That’s why we founded Xplenty. Since we both claim to make working with Big Data easier, we decided to run a quick comparison of Xplenty vs. EMR.
There are three ways to collect data on the cloud: storing it directly in the database, uploading log files, or logging via S3/CloudFront. Although we reviewed the pros and cons for each method there was one aspect we didn't mention - price. Let's try and estimate how much collecting data on the cloud actually costs.
What does it mean to be a pig? Well, according to the philosophers behind the Apache Pig project pigs eat anything, live anywhere, and are domestic animals. They even claim that pigs can fly!
Huge amounts of data are needed to calculate key performance indicators (KPIs), a luxury that only large enterprises were able to afford. This post series discusses how companies of all sizes can measure KPIs by collecting and processing Big Data on the cloud.
In his recent article "Turbocharge Your Porsche - Buy An Elephant", Bill Inmon, "the father of data warehousing", criticizes Cloudera for associating Big Data with the data warehouse, two totally unrelated terms according to him. This marks a new round in the fight between two academic geezers, a decades long argument over what is a data warehouse and how it should be implemented.
Last week I packed my suitcase and got on a plane to London. The agenda - presenting at the February Hadoop Users Group UK meetup. The meetup was supposed to take place two weeks ago, but it was delayed due to a Tube strike. Fortunately the strike was suspended after unions reached a deal with the London Underground and the rescheduled event took place on time.
Almost a hundred Hadoop enthusiasts turned up. From genius techies to data newbies, everyone came to network over pizza and beer. I was really excited to see such a vibrant Hadoop community in London, it's not trivial at all.
Readers of our blog should know by now that Apache Hadoop is great for offline batch processing of Big Data. But what about online streaming data? What if you’re running a ticker for the stock exchange or a real-time analytics dashboard? You might think that collecting streaming data is only relevant for big enterprises, but you don’t have to be The New York Stock Exchange to collect real-time data. Before you jump into the stream, here are 4 tips to get you started.
Consider for a moment, if you will, plastic patio furniture. Plastic Fantastic is a global manufacturer with several factories, warehouses, and plenty of stores. One can only imagine the sheer amount of data resulting from sales, production, suppliers, and finances. Everything that happens, from purchase and onward, to these chairs, tables, and cupboards in all corners of the world is measured.
Now, Hadoop! Now, SQL! Now, NOSQL, and Opensource! On, Cloud technology! On, Apps! On, SaaS, and Infrastructure!
First day at Vegas!
Peanut butter and jelly. Peas and carrots. Forest and Jenny. Bert and Ernie. Abbot and Costello. Sports and data? Of course.
Buzz about Big Data has been at fever pitch for over a year now. We hear a lot about how the insights we glean will propel businesses, about emerging technologies, and companies merging. But how often do we hear about the guts behind Big Data, what makes it actually work? Maybe I’m wrong, but from what I read, not often enough. So to buck that trend, let’s dive into one of the main building blocks of traditional data warehousing, ETL, and see how it fits in with current Big Data architecture.