Useful Machine Learning Functions with Python, Numpy, and Scipy

2020-12-26 16:23:20 | #programming #python #ml

Tested On

  • Linux Ubuntu 20.04
  • Windows 10
  • macOS Catalina

Numpy - Numeric Python is a multi-dimensional library for the manipulation and calculation of numeric data. It is usually represented in arrays and matrices which gives it its alias, numarray.

Numpy arrays are more appropriate for machine learning than Python lists. They are homogenous, faster and provide extensive functions for working with linear algebra and matrices.

Numpy arrays are stored in one continuous place in memory, due to their homogenous nature, this makes it faster to retrieve them. By homogenous, we mean that all the values in a Numpy array have to be of one data type. This enables Numpy to perform calculations more efficiently.

This doesn't mean that Numpy works with only one data type. It supports multiple data types, such as strings, integers, floats, dictionaries and boolean.

For example:

height = [165.3, 156.3, 178.4, 154.8]
names = ["Ada", "Lewis", "Reynolds"]

Why is Numpy Good for Machine Learning?

Machine learning is deeply rooted in mathematics, statistics, probability and algebra. Numpy has functions to transform arrays and perform calculations faster.

Setting Up a Numpy Project

How to Create Python Project Files with Windows 10 PowerShell 2.0+

cd ~
New-Item -ItemType "directory" -Path ".\numpy-project"
cd numpy-project
virtualenv venv

To verify that the virtual environment is active, make sure (venv) is in the PowerShell command prompt. For example, (venv) PS C:\Users\username\numpy-project>

How to Create Python Project Files with Linux Ubuntu 14.04+ or macOS

cd ~
mkdir numpy-project
cd numpy-project
virtualenv -p python3 venv
source venv/bin/activate

To verify that the virtual environment is active, make sure (venv) is in the terminal command prompt.

This will create the following files and folders, and activate the virtual environment.

▾ numpy-project/
  ▸ venv/

Installing Numpy with Pip

Numpy is a Python library so Python has to be installed on the machine to be used. Numpy is installed using pip

pip install numpy

The Anaconda bundle has Numpy pre-installed, it doesn't need to be installed again. All we have to do is:

import numpy

Or import using the shorthand np

import numpy as np

The array object in numpy is called ndarray, which is short for n-dimensional array. The n stands for for number. A numpy array can make use of multiple dimensions—1D with just columns, 2D with rows and columns, 3D, etc.

ndarray is created using the numpy.array() function.

import numpy as np

height = [145.3, 176.5, 185.3, 164.9, 150.3]
weight = [65.4, 88.7, 33.3, 98.2, 16.5]

# To convert this to a numpy array
num_height = np.array(height)
num_weight = np.array(weight)
# OR 
age = np.array([12, 44, 55, 76, 25])

Multidimensional Arrays with Numpy

Arrays with more than one level are called multidimensional ndarray in Numpy. The levels determine the numbers attached to the array. A 2D array has two levels. A 3D array has three levels and so.

Multidimensional arrays represent matrices or n-order tensors.

num_arr = np.array([[[1, 2, 3], [4, 5, 6]], 
                 [["Linda", "Reina", "Louis"], ["Tina",       "Rihannna", "Kela"]]])

print(arr_num[0:1])  # Indexing to print the first row
print(arr_num.shape)  # To determine the structure of the array

Performing Calculations on an Array

Mean, Standard Deviation and Coefficient

Numpy provides functions to get the mean, std, and coefficient of an ndarray.

arr = np.array([[1.73, 2.33, 5.43, 8.55],
                  [54.3, 53.4, 73.4, 22.6]])

# To get the mean

# To get the standard deviation

# To get the coefficient
# corrcoef compares the values of the two rows
print(np.corrcoef(arr[1:], arr[0:]))

Populating an Array with Random Numbers

import numpy as np

numbers = np.random.randint(10,100,7)  

Numpy with Pandas

Numpy works well with Pandas, making it easy to perform calculations or manipulate data inside DataFrames. We could convert an entire DataFrame to Numpy using the to_numpy() function, or work on columns individually.

dff = pd.DataFrame(
    [[300, 10000, "Rivers"],
     [1100, 300000, "Lagos"],
      [550, 140000, "Abuja"]],
   columns = ["Votes", "Population", "State"])


arr = dff.to_numpy() #converting to numpy
print('\nNumpy Array\n----------\n', arr)

print(np.mean(arr[:3,1])) # To calculate the mean values of the population


Scipy (Scientific Python) is a collection of tools that supports mathematical operations. It is built on the Numpy library and includes much of the basic Numpy functionality.

Installing Scipy with Pip

Scipy can be installed using pip:

pip install scipy

Or anaconda

conda install -c anaconda scipy

Numpy vs. Scipy

Both Numpy and Scipy are libraries for performing operations on numeric and scientific data. How do they differ and how are they the same? Which one is the best?

  • Scipy has more functions for high level scientific operations than Numpy
  • Numpy is built on C and C#. It's execution is faster than Scipy.
  • Scipy isn't constrained to homogenous data like Numpy.
  • Some functions in Numpy aren't full-fledged. Scipy has full-fledged versions of all it's functions.
  • Both Numpy and Scipy complement each other. We need to work with both of them for maximum results.

Working with Scipy

Scipy is a scientific library that's optimized for machine learning and data science. It comes with installed sub-packages that supports function for:

  • Integrations
  • Multidimensional image processing
  • File IO
  • Interpolation
  • Clustering
  • Optimization etc.

Scipy Constants

Scipy constant package contains a vast number of in-built constants used in scientific calculations. These constants range from units to time to angles and temperatures.

from scipy import constants

print(constants.mph)  # For speed
print(constants.atmosphere)  # For pressure

Scipy ndimage Processing

The scipy ndimage sub-package deals with image processing, image filtering, image manipulation, and classification. Here's a simple code to display the face of a racoon. The imageio package allows us to open and write image files.

import matplotlib.pyplot as plt
from scipy import misc
import imageio

f = misc.face()
imageio.imsave('face.png', f) # uses the Image module (PIL)


Scipy Optimizers

Algorithms in machine learning are complex equations, and they need to be minimized to optimize their performance. The Scipy optimize package provides features that allow us to optimize algorithms. It can perform unconstrained and constrained minimization, univariate and multivariate minimization, least-square minimization and global optimization routines.

For example, we can get the root of an equation using optimizers.

import numpy as np
from scipy.optimize import root

def root_func(a):
   return a*2 + 2 * np.cos(a)
result = root(root_func, 0.5)


Numpy and Scipy are both important libraries in Python and machine learning. They belong to a similar family, but their functionality differs. Scipy does not replace Numpy. There is much more functionality in Numpy and Scipy that will help you in your machine learning journey. As we go deeper, we will uncover them.

Book Recommendations for You


You must log in to comment. Don't have an account? Sign up for free.

Subscribe to comments for this post

Want To Receive More Free Content?

Would you like to receive free resources, tailored to help you reach your IT goals? Get started now, by leaving your email address below. We promise not to spam. You can also sign up for a free account and follow us on and engage with the community. You may opt out at any time.

Hire Us for IT and Consulting Services

Contact Us

Do you have a specific IT problem that needs solving or just have a general IT question? Use the contact form to get in touch with us and an IT professional will be with you, momentarily.


We offer web development, enterprise software development, QA & testing, google analytics, domains and hosting, databases, security, IT consulting, and other IT-related services.

Free IT Tutorials

Head over to our tutorials section to learn all about working with various IT solutions.

We Noticed Adblock Running

Because we offer a variety of free programming tools and resources to our visitors, we rely on ad revenue to keep our servers up. Would you consider disabling Adblock for our site and clicking the "Refresh Page" button?