The three V’s (+1). If you're reading this post, chances are good you know what they stand for. In the off-chance you don’t, they represent volume, variety, velocity, and value. The first three are relatively quantifiable. The fourth, however, is a bit less tangible, less quantifiable, and as a result it makes it hard to associate a solid ROI to the concept of Big Data usage. This in turn can make it an uphill battle when trying to assure those that guard the coffers that implementing a Big Data initiative is a sound investment. After all, we're in business for the money, not the data. So how do we figure out its value in order to decide on what kind of system to implement?
Regardless of its purpose, figuring out the exact value of such an endeavor can be an arduous task with a nebulous set of factors.
With that in mind, here are five of the more important details to consider, those of which should be the foundation of a Big Data initiative. By using this information, enterprises and smb’s alike will be able to decide on everything from whether or not to build their own data center or use the cloud, hire programmers or use an off-the-shelf software or Hadoop as a Service, to go it alone or enlist the services of a consultant.
- What is the goal of the initiative? Is it to streamline existing internal processes, gain customer insights, project quarterly earnings, assist with inventory, suggestive selling for ecommerce, or to figure out how to pitch to one of the premier sluggers in the league?
- Where is the data going to come from? Setting the aforementioned goal makes this somewhat of a no-brainer. Using the data for suggestive selling? You're probably going to draw on user purchases, products viewed, and even click-through urls and site referrals. Looking to streamline your supply chain? You almost certainly have data pertaining to raw materials, supplier KPI’s, bills of lading, warehousing, even driver performance.
- Once you know where the data is coming from, as well as how much data there is/will be, the picture of how to store the data will come into focus. Maybe the data isn't expected to grow all that much, thus you don't need something scalable. Or perhaps you collect mass amounts of data on a daily basis, so going with something cloud based for maximum scalability is the way to go.
- Tied into the data source(s) is the type of data being collected, which will dictate how it is processed. What’s being analyzed? Structured data such as log files? Semi-structured xml files or emails? Unstructured data like tweets and satellite feeds? Or all of the above. If you're going with the first option, good ol’ SQL Server might be what the doctor ordered. However if you need to process at least one of the other varieties of data, a Hadoop processing layer might be the most cost effective solution.
- The final component, at least for this exercise, is what will the data be used for? Are you the ecommerce that will use certain user events and behaviors to trigger other events like suggested products or automatic emails for lead nurturing? Or will it be used to analyze market trends to better advise clients?
It’s here where everything starts to get murky. Will those stock suggestions pay off enough that clients were convinced that your company knows what it’s talking about, enough so that they are willing to invest more and give you more commission? Will the suggestive selling cause people to make impulse buys before checkout? Will streamlining the supply chain cut enough waste that costs go down enough that you can afford to add another shipping route, thus decreasing the amount of time for products to hit the shelves, thus increasing the amount of smiles from those that sign the paychecks?
If you can answer these questions and put some real numbers with them, the chances greatly increase that the guys worried about money will nod their heads approvingly when it’s time to pull the trigger. Or you could pull a Deputy Ops Bill Rawls and cook the books. Whatever works for you.