How to Create 2D and 3D Scatter Plots with Python and Matplotlib

2021-02-26 11:20:43 | #programming #python #dataviz

Scatter plots are used to find the relationship between two variables, using Cartesian coordinates (coordinates along the x y axes). Unlike bar graphs, that simply plot values for comparison, scatter plots aim to communicate when there's a real or implied continuity (a trend) to the x variable data.

So for example, to better understand the correlation between people's heights and weights, every data point should contain a numeric value for the person's height, which we'll plot along the x-axis, and a numeric value for the person's weight, which we'll plot along the y-axis.

Prerequisites

It helps to be familiar with Python fundamentals, like data types, loops, functions, conditionals, modules, packages, virtual environments, etc. If you need a crash course, navigate to the Python Developer section of our tutorials.

If you're not familiar with Matplotlib, we recommend that you complete the prerequisites, above. They introduce you to important data science and data visualization concepts, various graph types, and scenarios where certain graphs are more appropriate than others. For example, a line graph is able to connect all points on a dataset with a line, indicating a slope. But a scatter plot is more concerned with expressing a trend, with the use of a regression line. Rather than connect all of the points, it overlays a line through high-concentration clusters in the direction they're leaning towards.

How to Set Up a Project Skeleton

How to Create Python Project Files with Windows 10 PowerShell 2.0+

cd ~
New-Item -ItemType "directory" -Path ".\matplotlib-bar-project"
cd matplotlib-bar-project
virtualenv venv
.\venv\Scripts\activate

To verify that the virtual environment is active, make sure (venv) is in the PowerShell command prompt. For example, (venv) PS C:\Users\username\matplotlib-bar-project>

How to Create Python Project Files with Linux Ubuntu 14.04+ or macOS

cd ~
mkdir matplotlib-bar-project
cd matplotlib-bar-project
virtualenv -p python3 venv
source venv/bin/activate

To verify that the virtual environment is active, make sure (venv) is in the terminal command prompt.

This will create the following files and folders, and activate the virtual environment.

▾ matplotlib-bar-project/
  ▸ venv/

Installing Matplotlib with Pip

This tutorial requires you to install a specific version of Matplotlib with pip3 install matplotlib==3.3.3. To get the plot to display in a window, you can install PyQt5 with pip3 install PyQt5==5.15.2.

We want to install a target version of these libraries because there may be API changes between the time this article is written to the time you read it.

Creating a Matplotlib Scatter Plot to Find the Correlation Between Height vs. Weight

Here's an example where we use Python and Matplotlib generate a scatter plot to understand people's weight in relation to their height. Although we have some outliers, the trend is that taller people generally weigh more than shorter people. This example uses lbs and inches. To convert lbs to kg, divide by 2.205, and for inches to cm, divide by 2.54.

import matplotlib.pyplot as plt

heights = [62, 62.5, 62.5, 62.5, 63, 63, 63.5, 63.5, 63.5, 63.5, 64, 64, 64, 64, 64.5, 64.5, 64.5, 65, 65, 65, 65, 65, 65, 65.6, 66, 66, 66, 66, 66, 66, 66, 66, 66.5, 67, 67, 67.5, 67.5, 67.5, 67.5, 67.5, 68, 68, 68, 68]
weights = [120, 120, 122, 123, 130, 140, 145, 140, 142, 143, 115, 120, 124, 135, 136, 135, 137, 130, 132, 135, 128, 139, 134, 140, 142, 130, 180, 145, 142, 143, 141, 149, 150, 145, 142, 145, 159, 155, 158, 166, 170, 165, 160, 163]

fig, ax = plt.subplots(figsize=(10, 5))
ax.scatter(heights, weights)

ax.set_title('Height vs. Weight')
ax.set_xlabel('Height')
ax.set_ylabel('Weight')

plt.scatter(heights, weights)

plt.savefig("plot.png")
plt.show()

Scatter plot showing the correlation between height and weight in lbs and inches

Explanation of Some Common plt.scatter() Parameters

Parameter Name Description Data Type Default Value
x The variables to be plotted along the x axis array-like or float or shape
y The variables to be plotted along the y axis array-like or float or shape
s Marker size on points ** 2 array-like or float or shape rcParams['lines.markersize'] ** 2
c Color of the marker array-like or list None
marker The marker style. See Matplotlib v3.3.3 Markers documentation. str rcParams["scatter.marker"] (default: 'o')
cmap A Colormap instance or colormap name that can only be used if c is an array of floats str or Colormap rcParams["image.cmap"] (default: 'viridis')
norm If c is an array of floats, norm scales the color data, c, in the range 0 to 1. If None, defaults to colors.Normalize. array of floats None
alpha Opacity of the marker between 0 (transparent) and 1 (opaque) float None
linewidths The line widths of the marker edges array-like or float rcParams["lines.linewidth"] (default: 1.5)
edgecolors The edge color of the marker. Possible values: 'face', 'none', or a series of Marplotlib color or sequence of colors color or sequence of color rcParams["scatter.edgecolors"] (default: 'face')

How to Customize a Scatter Plot with Python and Matplotlib

In the following example, we remove the top and right borders, add major gridlines, and add some transparency to the markers to make it easier to see when they overlap.

import matplotlib.pyplot as plt

heights = [62, 62.5, 62.5, 62.5, 63, 63, 63.5, 63.5, 63.5, 63.5, 64, 64, 64, 64, 64.5, 64.5, 64.5, 65, 65, 65, 65, 65, 65, 65.6, 66, 66, 66, 66, 66, 66, 66, 66, 66.5, 67, 67, 67.5, 67.5, 67.5, 67.5, 67.5, 68, 68, 68, 68]
weights = [120, 120, 122, 123, 130, 140, 145, 140, 142, 143, 115, 120, 124, 135, 136, 135, 137, 130, 132, 135, 128, 139, 134, 140, 142, 130, 180, 145, 142, 143, 141, 149, 150, 145, 142, 145, 159, 155, 158, 166, 170, 165, 160, 163]

fig, ax = plt.subplots(figsize=(10, 5))
ax.scatter(heights, weights)

ax.set_title('Height vs. Weight')
ax.set_xlabel('Height')
ax.set_ylabel('Weight')

plt.scatter(heights, weights, alpha=0.75)

# Remove top and right borders
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Adds major gridlines
ax.grid(color='grey', linestyle='-', linewidth=0.25, alpha=0.4)

plt.savefig("plot.png")
plt.show()

Scatter plot showing the correlation between height and weight in lbs and inches

How to Plot 3 Variables on a Scatter Plot with Python and Matplotlib

We can plot a third variable by mapping them to the sizes of the markers. In the following example, we add age data, and increase each value by a multiplier to make it easier to see the variations.

import matplotlib.pyplot as plt

# Data
heights = [62, 62.5, 62.5, 62.5, 63, 63, 63.5, 63.5, 63.5, 63.5, 64, 64, 64, 64, 64.5, 64.5, 64.5, 65, 65, 65, 65, 65, 65, 65.6, 66, 66, 66, 66, 66, 66, 66, 66, 66.5, 67, 67, 67.5, 67.5, 67.5, 67.5, 67.5, 68, 68, 68, 68]
weights = [120, 120, 122, 123, 130, 140, 145, 140, 142, 143, 115, 120, 124, 135, 136, 135, 137, 130, 132, 135, 128, 139, 134, 140, 142, 130, 180, 145, 142, 143, 141, 149, 150, 145, 142, 145, 159, 155, 158, 166, 170, 165, 160, 163]
ages = [20, 34, 24, 26, 32, 23, 27, 28, 40, 32, 33, 30, 31, 29, 28, 26, 25, 39, 37, 28, 38, 40, 25, 35, 25, 26, 28, 29, 30, 31, 25, 34, 38, 20, 21, 23, 29, 27, 27, 35, 30, 25, 28, 29]

fig, ax = plt.subplots(figsize=(6, 6))

# Titles
ax.set_title('Height vs. Weight')
ax.set_xlabel('Height')
ax.set_ylabel('Weight')

# Remove top and right borders
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Adds major gridlines
ax.grid(color='grey', linestyle='-', linewidth=0.25, alpha=0.4)

ax.scatter(heights, weights,
           linewidths=1, alpha=0.75,
           edgecolor='k',
           s=[age * 7.5 for age in ages],
           c='palegreen')


plt.savefig("plot.png")
plt.show()

Scatter plot

How to Plot 4 Variables and Add Legends

We can also map a fourth variable to color. When mapping a variable to the colors of the markers, it's best to use a categorical variable, to limit the color variation, which is harder to distinguish when too many colors are rendered. In the following example, we limit the color variation to two, to represent each gender.

import matplotlib.pyplot as plt
import numpy as np

# Data
heights = [62, 62.5, 62.5, 62.5, 63, 63, 63.5, 63.5, 63.5, 63.5, 64, 64, 64, 64, 64.5, 64.5, 64.5, 65, 65, 65, 65, 65, 65, 65.6, 66, 66, 66, 66, 66, 66, 66, 66, 66.5, 67, 67, 67.5, 67.5, 67.5, 67.5, 67.5, 68, 68, 68, 68]
weights = [120, 120, 122, 123, 130, 140, 145, 140, 142, 143, 115, 120, 124, 135, 136, 135, 137, 130, 132, 135, 128, 139, 134, 140, 142, 130, 180, 145, 142, 143, 141, 149, 150, 145, 142, 145, 159, 155, 158, 166, 170, 165, 160, 163]
ages = [20, 34, 24, 26, 32, 23, 27, 28, 40, 32, 33, 30, 31, 29, 28, 26, 25, 39, 37, 28, 38, 40, 25, 35, 25, 26, 28, 29, 30, 31, 25, 34, 38, 20, 21, 23, 29, 27, 27, 35, 30, 25, 28, 29]
genders = [0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

fig, ax = plt.subplots(figsize=(10, 7))

# Titles
ax.set_title('Height vs. Weight')
ax.set_xlabel('Height')
ax.set_ylabel('Weight')

# Remove top and right borders
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Adds major gridlines
ax.grid(color='grey', linestyle='-', linewidth=0.25, alpha=0.4)

#
scatter = ax.scatter(heights, weights,
                     linewidths=1, alpha=0.75,
                     edgecolor='k',
                     s=[age * age for age in ages],
                     c=genders)

# Adds legend
kw = dict(prop="sizes",
          func=lambda s: np.sqrt(s),
          alpha=0.6)
legend1 = ax.legend(*scatter.legend_elements(**kw),
                    loc="upper left", title="Ages",
                    labelspacing=2)
ax.add_artist(legend1)

handles, labels = scatter.legend_elements(prop="colors", alpha=0.6)
ax.legend(handles, labels, loc="upper right", title="Genders")

plt.tight_layout()
plt.savefig("plot.png")
plt.show()

Scatter plot

How to Plot Multiple Variables On a 3D Scatter Plot

Plotting a Third Variable Along the Z-Axis

Here's our same Height vs. Weight chart, but we have a 3rd(z) dimension to plot the age data along.

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Data
heights = [62, 62.5, 62.5, 62.5, 63, 63, 63.5, 63.5, 63.5, 63.5, 64, 64, 64, 64, 64.5, 64.5, 64.5, 65, 65, 65, 65, 65, 65, 65.6, 66, 66, 66, 66, 66, 66, 66, 66, 66.5, 67, 67, 67.5, 67.5, 67.5, 67.5, 67.5, 68, 68, 68, 68]
weights = [120, 120, 122, 123, 130, 140, 145, 140, 142, 143, 115, 120, 124, 135, 136, 135, 137, 130, 132, 135, 128, 139, 134, 140, 142, 130, 180, 145, 142, 143, 141, 149, 150, 145, 142, 145, 159, 155, 158, 166, 170, 165, 160, 163]
ages = [20, 34, 24, 26, 32, 23, 27, 28, 40, 32, 33, 30, 31, 29, 28, 26, 25, 39, 37, 28, 38, 40, 25, 35, 25, 26, 28, 29, 30, 31, 25, 34, 38, 20, 21, 23, 29, 27, 27, 35, 30, 25, 28, 29]
genders = [0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

fig = plt.figure(figsize=(6, 6))
ax = plt.axes(projection="3d")

# Titles
ax.set_title('Height vs. Weight vs. Age')
ax.set_xlabel('Height')
ax.set_ylabel('Weight')
ax.set_zlabel('Age')

# Remove top and right borders
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Adds major gridlines
ax.grid(color='grey', linestyle='-', linewidth=0.25, alpha=0.4)

ax.scatter(heights, weights, ages,
           linewidths=1, alpha=0.75,
           edgecolor='k',
           s=200,
           c='palegreen')


plt.savefig("plot.png")
plt.show()

Scatter plot

Plotting a Fourth Variable with Color Variations

In this example, we add a separate color for each gender and a legend.

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import ListedColormap

# Data
heights = [62, 62.5, 62.5, 62.5, 63, 63, 63.5, 63.5, 63.5, 63.5, 64, 64, 64, 64, 64.5, 64.5, 64.5, 65, 65, 65, 65, 65, 65, 65.6, 66, 66, 66, 66, 66, 66, 66, 66, 66.5, 67, 67, 67.5, 67.5, 67.5, 67.5, 67.5, 68, 68, 68, 68]
weights = [120, 120, 122, 123, 130, 140, 145, 140, 142, 143, 115, 120, 124, 135, 136, 135, 137, 130, 132, 135, 128, 139, 134, 140, 142, 130, 180, 145, 142, 143, 141, 149, 150, 145, 142, 145, 159, 155, 158, 166, 170, 165, 160, 163]
ages = [20, 34, 24, 26, 32, 23, 27, 28, 40, 32, 33, 30, 31, 29, 28, 26, 25, 39, 37, 28, 38, 40, 25, 35, 25, 26, 28, 29, 30, 31, 25, 34, 38, 20, 21, 23, 29, 27, 27, 35, 30, 25, 28, 29]
genders = [0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
labels = ['Female', 'Male']
colors = ListedColormap(['lightcoral', 'b'])

fig = plt.figure(figsize=(6, 6))
ax = plt.axes(projection="3d")

# Titles
ax.set_title('Height vs. Weight vs. Age')
ax.set_xlabel('Height')
ax.set_ylabel('Weight')
ax.set_zlabel('Age')

# Remove top and right borders
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Adds major gridlines
ax.grid(color='grey', linestyle='-', linewidth=0.25, alpha=0.4)

scatter = ax.scatter(heights, weights, ages, c=genders, cmap=colors)
ax.legend(handles=scatter.legend_elements()[0], labels=labels)

plt.savefig("plot.png")
plt.show()

Scatter plot

`

Conclusion

In this tutorial, we covered how to utilize Matplotlib and Python to generate a 2D scatter plot, a 3D scatter plot, and how to plot multiple variables.

Book Recommendations for You

Comments

You must log in to comment. Don't have an account? Sign up for free.

Subscribe to comments for this post

Want To Receive More Free Content?

Would you like to receive free resources, tailored to help you reach your IT goals? Get started now, by leaving your email address below. We promise not to spam. You can also sign up for a free account and follow us on and engage with the community. You may opt out at any time.



Hire Us for IT and Consulting Services









Contact Us

Do you have a specific IT problem that needs solving or just have a general IT question? Use the contact form to get in touch with us and an IT professional will be with you, momentarily.

Services

We offer web development, enterprise software development, QA & testing, google analytics, domains and hosting, databases, security, IT consulting, and other IT-related services.

Free IT Tutorials

Head over to our tutorials section to learn all about working with various IT solutions.

We Noticed Adblock Running

Because we offer a variety of free programming tools and resources to our visitors, we rely on ad revenue to keep our servers up. Would you consider disabling Adblock for our site and clicking the "Refresh Page" button?

Contact