The VACUUM command is an important tool in the DBA tool belt. Supporting an active, live database can be a tricky task, especially when long-running queries and database updates become a bottleneck in your software. Often, incorporating a VACUUM in your workflow can be a pivotal and sustainable solution to your database woes.

Table of Contents

  1. Redshift Data Storage
  2. What Does The VACUUM Command Do?
  3. When To Use the VACUUM Command
  4. How to Use VACUUM

Customer Story
Customer Story
Keith connected multiple data sources with Amazon Redshift to transform, organize and analyze their customer data.
Amazon Redshift Amazon Redshift
David Schuman
Keith Slater
Senior Developer at Creative Anvil
Before we started with Xplenty, we were trying to move data from many different data sources into Redshift. Xplenty has helped us do that quickly and easily. The best feature of the platform is having the ability to manipulate data as needed without the process being overly complex. Also, the support is great - they’re always responsive and willing to help.
TRUSTED BY COMPANIES WORLDWIDE

Enjoying This Article?

Receive great content weekly with the Xplenty Newsletter!

Redshift Data Storage

At the core of a Redshift cluster is the disc on which it stores data. In fact, it stores data on several discs. If you're not familiar with how bits get stored on a disc, here's a quick (very high level) summary:

A computer disc consists of thousands of microscopic magnets looping around the center. Think of it as a bunch of domino tiles laying down next to each other in a bunch of concentric circles. A tile can either be face up or face down: 1 or 0. Each tile is a magnet that can either be on or off. When data gets stored/updated/deleted in a database, a bolt of electricity will beam its way through the magnets and flip them on or off.

When you delete a record from a database, the magnets representing that data will be "switched off". The kicker, and why the VACUUM command is so vital, is that those magnets that were just switched off aren't automatically going to be used the next time you write something to the database. This is because the database keeps a pointer of its "next magnet to flip", and that pointer is always positioned at the last, fresh magnet to get flipped over.

What VACUUM does, at a microscopic level, is:

  1. Identify all of those tiles which were "switched off" due to UPDATE or DELETE commands
  2. Reclaim that space by moving other data into it, thus consolidating all of the data. 
  3. Move the pointer back to the next available magnet to flip.

For deeper dive, see our comprehensive guide on Amazon Redshift.

What Does The VACUUM Command Do?

Redshift will automatically sort rows and run VACUUM DELETE in the background. This condenses the data footprint, making data more easily accessible and cheaper to store.

There are a few different types of VACUUM commands:

  • VACUUM FULL: This is the default VACUUM command. It sorts the specified table and reclaims any disc space cleared out by DELETE or UPDATE commands. One can specify a specific table to VACUUM, or it can do an entire database (this operation can take a long time).
  • VACUUM SORT ONLY: This type of VACUUM command only sorts the data on disc. This is a useful distinction if you do not have space constraints but want your VACUUM command to be as quick as possible.
  • VACUUM DELETE ONLY: This VACUUM command only reclaims space that changed as a result of a DELETE or UPDATE command. It is quicker than a VACUUM FULL command, but there is little benefit of this command other than that.
  • VACUUM REINDEX table_name: This is a more complex VACUUM SORT ONLY. Not only does it sort the records, but it also takes into consideration sort key columns. In other words, if a table has many columns with indexes, this type of VACUUM would take those indexes into account vs the VACUUM SORT ONLY which only sorts on a primary key.

When to Use The Vacuum Command

Since Redshift runs a VACUUM in the background, usage of VACUUM becomes quite nuanced. In other words, it becomes difficult to identify when this command will be useful and how to incorporate it into your workflow. Fear not, Xplenty is here to help.

The biggest reason to incorporate VACUUM into your workflow is if your application has a lot of UPDATE and DELETE statements. Even though Redshift will run the VACUUM in the background periodically, it may not be frequent enough.

More importantly, if the frequent UPDATE and DELETE commands cause your data to by out of order, enforcing a VACUUM SORT will make data more accessible.

Customer Story
Customer Story
Keith connected multiple data sources with Amazon Redshift to transform, organize and analyze their customer data.
MongoDB MongoDB
Amazon Redshift Amazon Redshift
David Schuman
Dave Schuman
CTO and Co-Founder at Raise.me
They really have provided an interface to this world of data transformation that works. It’s intuitive, it’s easy to deal with [...] and when it gets a little too confusing for us, [Xplenty’s customer support team] will work for an entire day sometimes on just trying to help us solve our problem, and they never give up until it’s solved.
TRUSTED BY COMPANIES WORLDWIDE

Enjoying This Article?

Receive great content weekly with the Xplenty Newsletter!

How to Use VACUUM

You can treat VACUUM like any other SQL command you run on your Redshift cluster.

If you wish to run VACUUM on every table in your database:

VACUUM;

If you wish to run VACUUM on a specific table:

VACUUM table_name;

If you want to run VACUUM DELETE ONLY on a specific table:

VACUUM DELETE ONLY table_name;

Similarly for SORT ONLY:

VACUUM SORT ONLY table_name;

And lastly, for VACUUM REINDEX:

VACUUM REINDEX table_name;

Redshift Vacuum and Xplenty

Managing large software applications can be a daunting task, but you don't have to go it alone. Schedule a call with Xplenty to learn more about our offerings and how managing your data can be one less thing to worry about.