Useful Machine Learning Functions with Python, Numpy, and Scipy
2020-12-26 16:23:20 |
- Linux Ubuntu 20.04
- Windows 10
- macOS Catalina
Numpy - Numeric Python is a multi-dimensional library for the manipulation and calculation of numeric data. It is usually represented in arrays and matrices which gives it its alias, numarray.
Numpy arrays are more appropriate for machine learning than Python lists. They are homogenous, faster and provide extensive functions for working with linear algebra and matrices.
Numpy arrays are stored in one continuous place in memory, due to their homogenous nature, this makes it faster to retrieve them. By homogenous, we mean that all the values in a Numpy array have to be of one data type. This enables Numpy to perform calculations more efficiently.
This doesn't mean that Numpy works with only one data type. It supports multiple data types, such as strings, integers, floats, dictionaries and boolean.
height = [165.3, 156.3, 178.4, 154.8] names = ["Ada", "Lewis", "Reynolds"]
Why is Numpy Good for Machine Learning?
Machine learning is deeply rooted in mathematics, statistics, probability and algebra. Numpy has functions to transform arrays and perform calculations faster.
Setting Up a Numpy Project
How to Create Python Project Files with Windows 10 PowerShell 2.0+
cd ~ New-Item -ItemType "directory" -Path ".\numpy-project" cd numpy-project virtualenv venv .\venv\Scripts\activate
To verify that the virtual environment is active, make sure (venv) is in the PowerShell command prompt. For example, (venv) PS C:\Users\username\numpy-project>
How to Create Python Project Files with Linux Ubuntu 14.04+ or macOS
cd ~ mkdir numpy-project cd numpy-project virtualenv -p python3 venv source venv/bin/activate
To verify that the virtual environment is active, make sure (venv) is in the terminal command prompt.
This will create the following files and folders, and activate the virtual environment.
▾ numpy-project/ ▸ venv/
Installing Numpy with Pip
Numpy is a Python library so Python has to be installed on the machine to be used. Numpy is installed using pip
pip install numpy
The Anaconda bundle has Numpy pre-installed, it doesn't need to be installed again. All we have to do is:
Or import using the shorthand np
import numpy as np
The array object in numpy is called ndarray, which is short for n-dimensional array. The n stands for for number. A numpy array can make use of multiple dimensions—1D with just columns, 2D with rows and columns, 3D, etc.
ndarray is created using the numpy.array() function.
import numpy as np height = [145.3, 176.5, 185.3, 164.9, 150.3] weight = [65.4, 88.7, 33.3, 98.2, 16.5] # To convert this to a numpy array num_height = np.array(height) num_weight = np.array(weight) # OR age = np.array([12, 44, 55, 76, 25])
Multidimensional Arrays with Numpy
Arrays with more than one level are called multidimensional ndarray in Numpy. The levels determine the numbers attached to the array. A 2D array has two levels. A 3D array has three levels and so.
Multidimensional arrays represent matrices or n-order tensors.
num_arr = np.array([[[1, 2, 3], [4, 5, 6]], [["Linda", "Reina", "Louis"], ["Tina", "Rihannna", "Kela"]]]) print(arr_num[0:1]) # Indexing to print the first row print(arr_num.shape) # To determine the structure of the array
Performing Calculations on an Array
Mean, Standard Deviation and Coefficient
Numpy provides functions to get the mean, std, and coefficient of an ndarray.
arr = np.array([[1.73, 2.33, 5.43, 8.55], [54.3, 53.4, 73.4, 22.6]]) # To get the mean print(np.mean(arr)) # To get the standard deviation print(np.std(arr)) # To get the coefficient # corrcoef compares the values of the two rows print(np.corrcoef(arr[1:], arr[0:]))
Populating an Array with Random Numbers
import numpy as np numbers = np.random.randint(10,100,7) print(numbers)
Numpy with Pandas
Numpy works well with Pandas, making it easy to perform calculations or manipulate data inside DataFrames. We could convert an entire DataFrame to Numpy using the to_numpy() function, or work on columns individually.
dff = pd.DataFrame( [[300, 10000, "Rivers"], [1100, 300000, "Lagos"], [550, 140000, "Abuja"]], columns = ["Votes", "Population", "State"]) print(dff) arr = dff.to_numpy() #converting to numpy print('\nNumpy Array\n----------\n', arr) print(np.mean(arr[:3,1])) # To calculate the mean values of the population
Scipy (Scientific Python) is a collection of tools that supports mathematical operations. It is built on the Numpy library and includes much of the basic Numpy functionality.
Installing Scipy with Pip
Scipy can be installed using pip:
pip install scipy
conda install -c anaconda scipy
Numpy vs. Scipy
Both Numpy and Scipy are libraries for performing operations on numeric and scientific data. How do they differ and how are they the same? Which one is the best?
- Scipy has more functions for high level scientific operations than Numpy
- Numpy is built on C and C#. It's execution is faster than Scipy.
- Scipy isn't constrained to homogenous data like Numpy.
- Some functions in Numpy aren't full-fledged. Scipy has full-fledged versions of all it's functions.
- Both Numpy and Scipy complement each other. We need to work with both of them for maximum results.
Working with Scipy
Scipy is a scientific library that's optimized for machine learning and data science. It comes with installed sub-packages that supports function for:
- Multidimensional image processing
- File IO
- Optimization etc.
Scipy constant package contains a vast number of in-built constants used in scientific calculations. These constants range from units to time to angles and temperatures.
from scipy import constants print(constants.pi) print(constants.degree) print(constants.minute) print(constants.inch) print(constants.mph) # For speed print(constants.atmosphere) # For pressure
Scipy ndimage Processing
The scipy ndimage sub-package deals with image processing, image filtering, image manipulation, and classification. Here's a simple code to display the face of a racoon. The imageio package allows us to open and write image files.
import matplotlib.pyplot as plt from scipy import misc import imageio f = misc.face() imageio.imsave('face.png', f) # uses the Image module (PIL) plt.imshow(f) plt.show()
Algorithms in machine learning are complex equations, and they need to be minimized to optimize their performance. The Scipy optimize package provides features that allow us to optimize algorithms. It can perform unconstrained and constrained minimization, univariate and multivariate minimization, least-square minimization and global optimization routines.
For example, we can get the root of an equation using optimizers.
import numpy as np from scipy.optimize import root def root_func(a): return a*2 + 2 * np.cos(a) result = root(root_func, 0.5) print(result)
Numpy and Scipy are both important libraries in Python and machine learning. They belong to a similar family, but their functionality differs. Scipy does not replace Numpy. There is much more functionality in Numpy and Scipy that will help you in your machine learning journey. As we go deeper, we will uncover them.
If you're interested in learning about data visualization, take our Real World Data Science with Python course. This course teaches you how to programmatically generate graphs and charts from existing datasets. You'll also learn Python fundamentals, and how to utilize frameworks like Matplotlib, Pandas, Numpy, and Seaborn.