Data visualization is the use of graphics to represent data. The purpose of these graphics is to quickly and concisely communicate the most important insights produced by data analytics.
In its simplest form, visualization is the use of graphs to represent trends: a bar chart in Excel, for example. Visualization can also be extremely complex, with graphic designers and analytics experts working together to find ways to represent complex, non-homogeneous data.
Why is Data Visualization Used?
Visualization is a communication method, a quick and effective way to share meaning with other people. A list of numbers, for example, might seem to have no meaning at first glance. When these numbers are plotted as a line graph, however, people can immediately identify trends and relationships.
Visualization can also help to identify insights that may not be obvious from data. Analytics experts can often spot an important correlation when they compare visualizations or combine trends from multiple data sources.
There are many insights that can emerge from visualization.
- Trends: Variables don’t always progress in a linear way, especially when there are numerous outliers. Trendlines can help to identify the overall trend of a single variable. They also help to identify points that are different from the trend, which is an essential part of regression testing.
- Clusters: When comparing multiple variables, certain clusters may begin to emerge. This can indicate a correlation between those variables that might represent an important subgroup. Cluster analysis is an important step when performing segmentation.
- Proportions: The best way to show the proportional relationship between values is to represent them in relation to each other. Proportions can be shown continuously, with a visual such as a pie chart, or they can be shown discretely, with a bubble graph.
- Range: Visualization can show the ranges of certain variables, from minimum to maximum, as well as the mean. This is often plotted with a gauge graph that allows a quick understanding of the relative size of each range.
- Geography: Location-specific data can be plotted on a map. This can show important geographic clusters that are not visible in data.
Visualizations should provide an additional layer of meaning to data, which can then be used to empower future decision-making.
How is Data Visualization Performed?
Visualization is a business intelligence process. It is driven by high-level business needs, and the outputs are intended for a general business audience.
Most of the work of visualization is done by analytics experts. They follow a process like this:
1. Agree on Business Objectives
Before anyone begins working with data, they will clarify the objectives of the project. These may be department-specific, such as producing sales reports or customer insights. The project stakeholders will explain what they need, and the analytics team will set reasonable expectations for what’s possible with data.
2. Identify Relevant Sources
The analytics team can work with any source that contains relevant data. This might include production systems, data warehouses, or data lakes. The team will look at the data lineage, assess data quality, and decide on any transformations that may need to be performed.
3. Extract Relevant Data
The team now starts to pull data from the relevant sources. Usually, this is done with SQL queries if they have direct access, or API calls if they don’t. In some instances, it may be necessary to perform a file export, which will then have to be uploaded elsewhere.
4. Apply Transformations to Data
If the data isn’t in a suitable format, or the data is not compatible with other sources, it may need to pass through a transformation layer. This applies a universal schema to all incoming data, resulting in consistent tables that are ready for intensive analysis. This stage also involves data cleansing, which will speed up the analysis stage.
5. Create a Data Repository
If required, the analysts may create a repository to store transformed data. This will be a data warehouse for large data sets or a data mart for departmental-specific sets. If this type of repository is used, then the transformed data will be immediately available for any future visualization projects.
6. Connect the Repository to a Business Intelligence Platform
The previous steps ensure that data is clean, complete, and in a format that is suitable for the relevant business intelligence platform. At this point, the analysts will begin to identify the main trends and insights and shape these into visualizations.
7. Share Visualizations with Business Stakeholders
If visualization is successful, it will support the relevant business need. For example, if the project began with the goal of understanding customers, then the visualization should offer some useful information about spending patterns, engagement, lifetime value, and other details.
Data visualization won’t always follow the steps above. The Extract, Transform, Load (ETL) process only applies when combining multiple sources into a single repository. Some organizations may already have a functioning repository, or they may use virtualization to present a quick overview of the data.
Other organizations may store their data in a data lake. This repository structure does not apply a schema to incoming data. Instead, analysts must use business intelligence tools and technology such as MapReduce to navigate the data lake and find useful insights.
What are Some Common Data Visualization Tools?
There are hundreds of commercial and open-source visualization tools on the market, each with a different library of charts and graphs.
Some of the more common tools include:
- Google Data Studio: A free service from Google to create visual dashboards and reports
- Google Charts: A free service from Google with a good library of visualizations
- Chartio: Highly configurable professional BI platform with excellent visualization tools
- Tableau: Beautiful visual representations of data, although the price tag is quite high
- Looker: Flexible visualization with sophisticated data exploration tools
- Microsoft Excel: Perhaps the world’s most popular visualization tool, often used for off-the-cuff graphs and charts
These are a few of the popular visualization options. There are many alternatives, including those that work with certain data types, focus on communication with a particular audience, or include stronger business intelligence tools to support the quality of insight.