One of the most long-overdue AWS functionalities is finally available. During the most recent Amazon Web Services Re:Invent gathering in Las Vegas, Amazon announced Athena: a service that lets you run SQL queries directly on Amazon S3. It’s serverless, so there’s nothing to install or deploy, and users pay only for what they query.
Amazon Athena is powered by Presto, a distributed SQL query engine for big data, and it’s fast and scalable. We tested the service on a pile of data we had sitting on S3, and the performance was amazing.
Amazon Athena is comparable to Google BigQuery in that you basically pay for storage (S3) and data queried, but in Athena you also have to handle the underlying data (files format, directory structure), so it’s more complex to handle but more flexible. When you compare Amazon Athena to Amazon Redshift, Redshift’s performance is superior, but you’re also bound to Redshift clusters’ limitations (and cost) whereas Athena is serverless.
This is a big step forward in making Amazon S3 an ideal environment for an organizational data lake. Data from all sources can now be pushed into S3 and queried on an ad-hoc basis in a performant, scalable, accessible manner. It’s a data professional’s dream come true.
However, you still need to get that data into Amazon S3, whether it’s from other services you use on Amazon, such as RDS or EMR, or services outside AWS, such as applications like Salesforce, MixPanel, Facebook or Google Analytics or other data stores implemented on other platforms.
Xplenty can write data to Amazon S3 in all the file formats Amazon Athena supports (CSV, TSV, JSON, and Parquet), and it can connect to more than 100 data repositories and services so you can pump data into your data lake from external data stores on a regular basis. Xplenty is also a service, so companies using Xplenty don’t have to worry about maintenance or administration.
Ready to take Xplenty for a test drive? Click here for a free seven-day trial.