In the first two posts in the series we learned how to scale data collection on the cloud and collect Big Data with S3/Cloudfront logging. Some of our customers started collecting Cloudfront logs and wondered what to do with them, so we decided to write one more post to show how to analyze AWS CloudFront logs with Xplenty’s Hadoop-as-a-Service. You can try this demo for free even if you don’t have an Xplenty account or any logs at the moment.
1. Login to your Xplenty Account
If you don’t have one signup for free.
2. Create a Cluster
If you don’t have any clusters yet you can create a Sandbox cluster for free. Click my clusters on the top left and then + new cluster on the right. You’ll receive an email when the cluster is ready.
3. Create a Package Using the Template
Click my packages at the top, + new package dropdown on the right, and select From template. Choose AWS CloudFront Log Analysis and click Create Package.
4. Optional - Edit the Package
A demo package will be created automatically using sample data from a public Xplenty bucket. Leave it as is if you want to run the demo. Otherwise, you may make the following modifications:
Use your own data - edit the cloudfront_logs component and create a new cloud storage connection with the relevant bucket and path. Don’t forget to change the destination components (the ones at the end of each chain) to save the data on your account.
Filter files - edit cloudfront_logs and use pattern matching in the path field to filter certain files. E.g., if you’d only like to analyze logs in the years 2012-2013, enter cloudfront/*.201-* (cloudfront is the demo path).
Add more reports - the demo generates 4 reports: traffic by date, by URL, by edge, and by geographic location. You can add clone components to split the data path and add your own component chain to process the data. It’s easy and no coding is necessary - see our documentation for further instructions.
5. Run the Job
Back in my clusters click the run now button. Choose the AWS CloudFront Log Analysis package that you’ve just created and click run now.
The job will start running as soon as the cluster is ready. Feel free to monitor the job status on-screen.
6. View the Results
You’ll get an automated email when the job is done. The reports will be generated in the public bucket but you can also view a sample via the Xplenty web app.
Back in the my clusters dashboard, click View outputs for the job that has just finished. A dialog will open with the various reports that were generated. Click the one that you’d like to view.
If you have any other questions, please feel free to contact us and we'll happily explain this, or anything else for that matter, in greater detail.