BLOG

A focus on visualizations: Scatter plot

PUBLISHED ON

Tweet about this on TwitterShare on FacebookShare on LinkedIn

What is a scatter plot chart used for? Can correlation be used to predict? With the updated article about line graphs, recently we brought you back to one of our most visited blog post series focusing on visualization. There are numerous other chart types that, if used appropriately, can add an extra value to a dashboard. Since scatter plots are less commonly used, let us walk you through the main aspects to consider when choosing the chart type.

The original article was published back in 2014. In the years since, Sweetspot has been revamped with improved user experience and a new interface, including a fresh version of the original visuals and chart types.

It’s clear that the objectives and best practices of scatter charts detailed by Holly McKendry haven’t been replaced. Still, there are new designs of the same visualization formats that are worth a look. This is why we updated the Sweetspot images below.

Do you remember back in Algebra class in school, trying to find patterns of weight to height ratio of your classmates? Or maybe how many ice creams are sold on the hottest days of summer?  As long as you were paying attention, you probably remember this was done in order to find a correlation.

scatterplot

Up until this point in our visual series, we’ve only briefly touched on correlations, but today
we’d like to take a look at the graph best known for representing these: the scatter plot.

The scatter plot should not be mistaken with the line graph, which connects each data point. The scatter plot aids us in finding pattern, by using cartesian coordinates to show the relationship between two metrics, revealing how they affect one another.

When to use a scatter plot When NOT to use a scatter plot
Scatter plots are best used to:

  • Unveil patterns
  • Show relationship between two sets of data
Scatter plots are generally not our best option for representing:

  • If you only have a few data-points to represent, it won’t reveal any pattern. It might be adequate to use a column or bar chart instead, as the original purpose of the scatter chart does not apply.

Scatter plots are powerful visualizations for proving or disproving cause-and-effect relationships, or even to set up a control system to monitor how one variable influences another. For example, if digital marketers would like to see how an increase in content affects engagement they can monitor to see if a correlation occurs. If one does occur they can then explore and take action.

correlation scatterThings to look for:

Correlation:

Correlations can be positive or negative, basically this is how we determine how two variables affect one another. A positive correlation means that as one variable increases so does the other, for example: the more followers I have, the higher my engagement will be – therefore, the smarter marketer will try to increase the number of followers they have to improve brand engagement! On the contrary, negative correlations show just the opposite, so as my followers go up, my engagement goes down – in this case I should definitely not use strategic marketing to increase my followers.

Correlation is positive if there is an upward slope moving from lower left to upper right, and negative if there is a downward slope moving from upper left to lower right.

positive correlation scatter plot scatter plot negative correlation

scatter plot correlation

 

The strength of a correlation is just as important as the direction. Correlations are strong if there is a tight grouping of data values, and weak, or even nonexistent if there is no strong clustering.

 

 

scatter correlation

Scatter graphs don’t always show correlations between data, but also show where there is no recognizable pattern. In the graph on the right, you can see the two variables are not showing any sort of correlation. For instance, following the same example, increasing the number of followers will most likely show no effect on engagement.

 

Line of best fit:

In order to make the correlation of a scatter plot stand out more than individual data points, use a straight trend line. Trend lines not only save viewers time, but they also make it easier to spot the direction and strength of a correlation.

scatter trend

scatter outlierOutliers: are known as the “odd man out” in scatter plots. Any data points that are relatively far from the rest are considered to be outliers and can skew your data. However, outliers are reflective of real life scenarios, as they show that not everything fits into a pattern, but they allow us to identify exactly where data strayed away from the expected pattern. Sometimes it is useful to analyze what caused these outliers in order to attempt to correct their behavior to suit your aims.

A trick to spot outliers quickly is to add a line of best fit, as the outliers will stand out as the data point farthest away from the line.

Thing to consider:

3D Scatterplots:

It is possible to display three or more data sets in a 3D scatter plot, however, I strongly discourage any dashboard users to consider using this option. Even when designed well, this graph requires too much study time to understand, which is time dashboard users don’t have to waste.

Overplotting:

If your scatter plot appears to be littered with smudges rather than actual dots,  the solution is simple: adjust your scale to allow for all dots to be interpreted clearly. If that doesn’t work, your problem may be the size of your dots, adjust the size of your dots to allow your reader to view the chart clearly. If overplotting continues to be a problem, translucency is a powerful tool to make your data points readable.  By making your dots semi transparent, this deemphasizes outliers, and allows readers to focus on the direction and strength of the boldest dots in order to recognize the trend.

scatter plot overplotted

scatter plot scale

Predictions:

An added benefit of the line of best fit is the possibility to make predictions by extending the trend line. In order to make a reasonable prediction, data points should have a strong correlation.

scatter prediction

Mean lines:

Provide the reader with extra information to help classify the dots, and interpret the chart accurately. The extra information provided allows analyst to visually see how different variables are comparing against the mean of the dataset, which in turn helps them to identify how their indicators are performing. For example, if you were to use mean lines to represent keywords you will be able to identify which keywords are getting the most impressions, which in turn with guide you in your marketing attempts to choose the highest performing keywords, and rewrite, or discard lower performing indicators.

In conclusion:

The scatter plot is a visualization that serves one main purpose, but it does it well, it reveals the direction and degree to which two quantitative values are correlated. The scatter plot is the perfect visualization for representing cause and effect relationships, which can be used to show your stakeholders just how well your indicators are performing. So move past the scatter plots you created back in high school, try using them to report the correlations that provide you and your business with the most valuable insights!

How do you use scatter plots optimize your reporting or insight delivery needs?

 

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Lenke Harmath

Product Marketing, graduated with Marketing and Advertising degree at Budapest Business School, Hungary. Interested in product adoption, social media, UX design and interpersonal communication.


Add a comment

Try Sweetspot today!

Not Another Dashboard.