How To Offload Data Processing from Google BigQuery

How To Offload Data Processing from Google BigQuery

The sassy software engineer with the South American accent convinced us: Google BigQuery is cool.

BigQuery is to Google as Redshift is to Amazon: a Big Data warehouse on the cloud for the SQL-savvy. It’s based on Dremel, a database that Google built to handle massive services such as Search, YouTube, Gmail and Docs.

BigQuery is a columnar DB with append-only tables, which is why it can do interactive querying really fast: 35 billion rows in tens of seconds, according to Google.

Now, do we need to throw away the yellow elephant called Hadoop in favor of BigQuery?

Although BigQuery sounds like the end-all solution for processing Big Data, it’s not right for everything. Yes, it’s good for real-time querying, but Google itself recommends using Hadoop’s MapReduce rather than BigQuery for three main cases:

  1. Executing complex data mining on Big Data that requires multiple iterations and paths of data processing with programmed algorithms.

  2. Executing large join operations across huge datasets.

  3. Exporting large amounts of data after processing.

So, if you need to do any of the above data processes, you’ll be better off offloading to Hadoop. Better yet, you can offload it to Xplenty.

Xplenty is powered by Apache Hadoop with added BigQuery integration. You can take data from Google Cloud Storage and many other sources, mine it, join it, and transform it with Xplenty. Then you can store the results on Google BigQuery for your data warehouse needs. Here’s how.

Google BigQuery Settings

Project Access

To run load jobs, you need to grant read permissions to Xplenty for the relevant BigQuery dataset project container:

  1. Access your Google Cloud Console.

  2. Click the project name for which you want to grant Xplenty access.

  3. In the navigation menu, click Permissions.

  4. Click Add Member.

  5. Paste Xplenty's Google Cloud email:

  6. Click Can View.

  7. Click Add.

Dataset Access

To enable Xplenty to write to a specific dataset, grant write permissions to Xplenty for the relevant BigQuery dataset:

  1. Access your Google Cloud Console.

  2. Click the project name for which you want to grant Xplenty access.

  3. In the navigation menu, expand Big Data and click BigQuery.

  4. In the new window, verify the project name (to switch projects, click the down arrow next to the project name, then hover on Switch to project and select a different project).

  5. Hover on the dataset, click the down arrow when it appears then click Share dataset.

  6. In the Add people textbox, paste Xplenty's Google Cloud user name:

  7. In the drop-down menu to the right of the textbox, click Can Edit.

  8. Click Add.

  9. Click Save Changes.

Getting the Project ID:

  1. Log in to your BigQuery account.

  2. Go to Projects - Manage Projects.

  3. Look for your project name and copy its project ID.
    Google BigQuery project ID

For more information, see Google BigQuery Access Control.

Xplenty Settings

Configuring the Google BigQuery Connection

  1. Log in to your Xplenty account or sign up for free.

  2. Click the user image at the top right and select Manage connections.

  3. Click the new connection button and select Google BigQuery under Analytical Databases.
    Google BigQuery connection in Xplenty

  4. Enter a name, the project ID that you copied from BigQuery and the relevant dataset ID.
    Google BigQuery connection details

  5. Click the test connection button to make sure that the connection works. If it fails, one of the details may have been entered incorrectly or you haven’t given Xplenty permissions to connect to your BigQuery dataset.

  6. Click Create Google BigQuery connection.

Output Processed Data to Google BigQuery

  1. Open or create a new Xplenty package under My Packages.

  2. Add the Google BigQuery destination component by clicking the plus button under the relevant component.
    Google BigQuery destination component

  3. Click the component to edit it.

  4. Select the relevant Google BigQuery connection and enter the target table. You can also auto-fill the fields with the green button on the right, or check the connection with the green button on the left.
    Google BigQuery destination component editor

  5. Click okay and don’t forget to save the package before running it.


Google BigQuery looks great for real-time querying of huge datasets. For optimal performance, you should offload batch processing to Apache Hadoop, especially for complex data mining, large joins and exporting large amounts of processed data. Xplenty, which is powered by Hadoop, can easily offload the data processes and write the results back to Google BigQuery for big querying.

Integrate Your Data Today!

Get a 7-day free trial. No credit card necessary.