How to Compare Values with a Bar Chart, Python, and Matplotlib

2021-01-18 15:02:34 | #programming #python #dataviz

Bar graphs utilize rectangular bars to represent categorical data, making comparisons across categories easy to visually understand. Bar graphs can be plotted either horizontally or vertically. We can generate bar graphs quite easily, with Python and Matplotlib.

If you're not familiar with Matplotlib, we recommend that you read the previous chapter in the series, which introduces you to Matplotlib, and teaches you how to generate line charts, and multiline charts where you can overlay multiple lines for comparison.

Prerequisites

It helps to be familiar with Python fundamentals, like data types, loops, functions, conditionals, modules, packages, virtual environments, etc. If you need a crash course, navigate to the Python Developer section of our tutorials.

How to Set Up a Project Skeleton

How to Create Python Project Files with Windows 10 PowerShell 2.0+

cd ~
New-Item -ItemType "directory" -Path ".\matplotlib-bar-project"
cd matplotlib-bar-project
virtualenv venv
.\venv\Scripts\activate

To verify that the virtual environment is active, make sure (venv) is in the PowerShell command prompt. For example, (venv) PS C:\Users\username\matplotlib-bar-project>

How to Create Python Project Files with Linux Ubuntu 14.04+ or macOS

cd ~
mkdir matplotlib-bar-project
cd matplotlib-bar-project
virtualenv -p python3 venv
source venv/bin/activate

To verify that the virtual environment is active, make sure (venv) is in the terminal command prompt.

This will create the following files and folders, and activate the virtual environment.

▾ matplotlib-bar-project/
  ▸ venv/

Installing Matplotlib with Pip

This tutorial requires you to install a specific version of Matplotlib with pip3 install matplotlib==3.3.3. To get the plot to display in a window, you can install PyQt5 with pip3 install PyQt5==5.15.2.

We want to install a target version of these libraries because there may be API changes between the time this article is written to the time you read it.

pyplot.bar() Function Parameters Explained

In the function signature below, the bar() function requires 2 parameters (x and height), at minimum, with the rest being optional.

pyplot.bar(x, height, width, bottom, align)
Parameter Name Description Data Type Default Value
x The x coordinates of the bars Sequence of scalars
height The height(s) of the bars Sequence of scalars
width The width(s) of the bars (optional) Sequence of scalars 0.8
bottom The y coordinate(s) of the bars bases (optional) Sequence of scalars None
align Alignment of the bars to the x coordinates (optional; 'center' or 'edge') String 'center'
color The colors of the bar faces (optional) String None
edgecolor The colors of the bar edges (optional) String None
linewidth Width of the bar edges (optional; if 0, don't draw edges) String None
align Alignment of the bars to the x coordinates (optional; 'center' or 'edge') String 'center'

How to Create a Bar Chart with Python and Matplotlib

Example: Plotting a Student's Grades

This example plots a student's grades across 5 different subjects. By analyzing the bar graph, we can conclude that Harry's highest grades were in his Computer Science class and his lowest were in Spanish.

Filename: bar_graph.py

from matplotlib import pyplot as plt

plt.title("Harry's Grades")

# Data
subject = ['English', 'Math', 'Physics', 'Computer Science', 'Spanish']
marks   = [75, 80, 65, 100, 54]

# Plot
plt.bar(subject, marks, width=0.50, edgecolor='k', linewidth=2)

plt.xlabel("subjects")
plt.ylabel("grades")

# Create the graph ticks with a list comprehension
plt.yticks(ticks=[x * 10 for x in range(11)])

# Render
plt.savefig("plot.png")
plt.show()

Bar chart comparing a student's grades across five subjects

Explanation of the Code

First, we import the necessary libraries(numpy and matplotlib)

Lines 7, 8: We generate data (subjects and grades/marks) to represent Harry's marksheet.

Line 11: We plot this data into plt.bar()

Lines 13, 14: We assign names to the axes using plt.xlabel() and plt.ylabel().

Line 17: Using plt.y_ticks(), we can assign different labels across the y-axis. In this example, we use a list comprehensions to generate a range of possible grades ([0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]).

Limitations of Vertical Bar Graphs

The bar graph we created from the previous chapter is suitable when dealing with a small quantity of categories. However, due to the limited amount of space along the x-axis to distribute the information, too many categories will cause the vertical bars to merge into each other. When there's a large quantity of data and multiple variables to consider, a horizontal bar graph is better suited because it evenly spaces out the bars across horizontal rows.

The following example demonstrates the overcrowding that results from packing too many categories into a vertical bar graph:

Filename: crowded_bar_graph.py

import numpy
from matplotlib import pyplot as plt

plt.title("Harry's Grades")

# Data
subjects = ['eng', 'phy', 'chem', 'math', 'comp',
            'eng1', 'phy1', 'chem1', 'maths1', 'comp1',
            'eng2', 'phy2', 'chem2', 'maths2', 'comp2',
            ]

marks = numpy.random.randint(20, 100, 15)

# Plotting
plt.bar(subjects, marks, width=0.80, edgecolor='k', linewidth=2)

plt.xlabel("subjects")
plt.ylabel("grades")

# Using a list comprehension to create a list
plt.yticks(ticks=[x * 10 for x in range(11)])

plt.savefig("plot.png")
plt.show()

An overcrowded bar chart comparing a student's grades across five subjects

Solving Overcrowding with a Horizontal Bar Graph

Filename: horizontal_bar_graph.py

import numpy
from matplotlib import pyplot as plt

plt.title("Harry's Grades")

# Data
subjects = ['eng', 'phy', 'chem', 'maths', 'comp',
            'eng1', 'phy1', 'chem1', 'maths1', 'comp1',
            'eng2', 'phy2', 'chem2', 'maths2', 'comp2',
            ]

# Generating random marks
marks = numpy.random.randint(20, 100, 15)

# Tracking
my_colors = ['red' if (x < 40) else 'c' for x in marks]

# Plotting
plt.barh(subjects, marks, color=my_colors)  # (y,x)

plt.xlabel("grades")
plt.ylabel("subjects")

plt.xticks(ticks=[x * 10 for x in range(11)])

plt.savefig("plot.png")
plt.show()

A horizontal bar chart comparing a student's grades across five subjects

Explanation of the Code

To create a horizontal bar graph, we invoked plt.barh().

plt.barh(
     y,
     width,
     height=0.8,
     color,
     align='center,
     **kwargs
   )

Unlike plt.bar(), the first argument in plt.barh() is y. By default, the width is set to 0.8. We used a list comprehension to generate a red color bar for marks below 40, and cyan for the rest. This gives us another visual indicator that makes it much easier to distinguish relevant information.

How to Create a Multiple Bar Chart

A multiple bar graph allows us to create a cluster of bars for each dataset, for comparison purposes. The following example compares the stocks of Tesla with Ford from the past 5 years.

Filename: multiple_bar_graph.py

import matplotlib.pyplot as plt
import numpy as np

# Average stock prize in last 5 years
tesla_stock = [46, 41, 62, 63, 54]
ford_stock = [15, 12, 11, 10, 9]

year = [2015, 2016, 2017, 2018, 2019]

# We cannot add width to year so we create another list
indices = np.arange(len(year))

width = 0.20

# Plotting
plt.bar(indices, tesla_stock, width=width)

# Offsetting by width to shift the bars to the right
plt.bar(indices + width, ford_stock, width=width)


# Displaying year on top of indices
plt.xticks(ticks=indices, labels=year)


plt.xlabel("year")
plt.ylabel("average stock price")
plt.title("Tesla vs. Ford")

plt.savefig("plot.png")
plt.show()

A multiple bar chart comparing the average stock price of Tesla vs. Ford between 2015-2019

Similary, if there are more bar graphs, you can add to width to move the additional bars to the right and subtract from width to move bars to the left.

Conclusion

In this tutorial, we covered how to utilize Matplotlib and Python to generate a bar chart, a horizontal bar chart, and a multiple bar chart, to make comparisons between various data sets.

Comments

You must log in to comment. Don't have an account? Sign up for free.

Subscribe to comments for this post

Want To Receive More Free Content?

Would you like to receive free resources, tailored to help you reach your IT goals? Get started now, by leaving your email address below. We promise not to spam. You can also sign up for a free account and follow us on and engage with the community. You may opt out at any time.



Hire Us for IT and Consulting Services









Contact Us

Do you have a specific IT problem that needs solving or just have a general IT question? Use the contact form to get in touch with us and an IT professional will be with you, momentarily.

Services

We offer web development, enterprise software development, QA & testing, google analytics, domains and hosting, databases, security, IT consulting, and other IT-related services.

Free IT Tutorials

Head over to our tutorials section to learn all about working with various IT solutions.

We Noticed Adblock Running

Because we offer a variety of free programming tools and resources to our visitors, we rely on ad revenue to keep our servers up. Would you consider disabling Adblock for our site and clicking the "Refresh Page" button?

Contact