Everything comes as a service these days, and so does collecting Big Data. Various platforms on the web are happy to take data collection off your coding hands, making it easy for you to collect data from various sources in one location. Some call this a data hub. The following five platforms will help meet your ever-increasing data collection needs.

S3/CloudFront Logging

This isn’t a data hub per se, but it can sort of act like one. Amazon’s S3/CloudFront provides automated web logging for objects in the object store. Therefore, you can track events by sending an HTTP request for a 1x1 pixel image from a relevant S3 directory. This generates a W3C log with all of the HTTP request parameters: IP address, browser, date/time, etc. Extra session level data, such as username or mouse position, can be passed via the querystring. These images can be placed in directories with relevant names, e.g., /click/, to differentiate between the various events, and voila—you have a data collection service. For more info, please see our blog post on how to implement S3/CloudFront logging.


Segment is a customer data hub that collects user behavior data from the web and mobile apps. Segment can then send the data onwards to services such as Woopra or Google Analytics or even store the data on Amazon Redshift for custom analysis. All you have to do is add some logging lines to your code, and Segment will take care of the rest. Segment integrates with Xplenty so you can process the data even further.


Loggly is an enterprise-class log management solution on the cloud. It provides centralized logging, as well as log searches and log graphs, and doesn’t require any agent installation. Loggly uses open-source technologies such as ElasticSearch, Apache Lucene, and Apache Kafka. You can load data from Loggly into Xplenty via S3, join it with other sources, and export it to your data warehouse.


Similar to Loggly, Logentries provides central, cloud-based log management, including monitoring, tailing, and querying.


Papertrail is another cloud-hosted log management service, although it is more of a no-frills service with less flashy dashboards. We use it for our own debugging purposes and integrate Papertrail with Xplenty to process the logs.

Interested in reading more about Big Data? Read our guide to processing unstructured data here.