Introduction to Machine Learning with Python
2020-12-18 19:08:45 |
- Linux Ubuntu 20.04
- Windows 10
- macOS Catalina
Machine Learning is a branch of Artificial Intelligence (AI) that employees computer algorithms to find patterns in massive datasets. These algorithms automatically improve their accuracy over time, without being explicitly programmed to do so.
Rather than program all possible scenarios and outcomes, machine learning algorithms are instead given tons of sample data and instructed to reach a target output or prediction. The closer the algorithm is at producing the specified output consistently, the higher their accuracy.
Unlike programs that are explictly instructed to carry out a specific task, machine learning programs can learn, predict, and even be creative. machine learning can be applied to make product recommendations, predict user behavior, automate processes, detect fraud, price dynamically, and more.
What Should I Learn Before Creating a Machine Learning Algorithm?
From a mathematics standpoint, a solid understanding of algebra, statistics, and probability will get you very far.
From a Python programming perspective, you should be familiar with data structures (lists, tuples, sets, etc.), how to read and write data, and work with files and datasets. Data science is the foundation of machine learning and Pandas, Matplotlib, and Seaborn are essential Python libraries for statistical analysis and data visualization.
You should be comfortable with Pandas DataFrames—creating, manipulating, and slicing data out of them. Then you should learn about Numpy data arrays, which are covered in this tutorial.
What Are Some Practical Use Cases for Machine Learning?
Using linear regression gives you the power to, for example, predict a student's grades by reading sample data such as:
- Time spent watching TV
- Hours studied
- Number of siblings
You could train a linear regression model to make this prediction to an exact decimal value. You could also utilize K-means clustering to classify cars as safe or not safe based on vehicle data:
- Safety rating
- Number of doors
The "Big 3" Machine Learning Tasks
Regression Analysis with Machine Learning
Regression analysis is comprised of machine learning methods that enable us to predict a continuous outcome variable (y) based on one or more predictor variables (x). These continuous/real values could help us to understand and predict, for example, a person's height at a projected age, or an employee's salary based on education, skill level, number of siblings, household income, etc.
Regression analysis can be performed through the use of linear regression plots, regression trees, nearest neighbors, and deep learning, depending on the complexity of the data set and the relationship between its values. For example, linear regression aims to fit a straight hyperplane to the dataset, and a straight line between two variables is the simplest way to do that.
For non-linear relationships, a regression tree is better suited.
Deep Learning through Regression Analysis
And for training a program to reach a target outcome based on training data, deep learning is best. For example, to train a machine learning program to learn how to identify red stop signs in a series of images, you would provide the program with thousands of images where stop signs have already been identified. Then you would feed the program a new set of "unseen" images to see how accurate it was.
Classification is a method in which a machine learning program is trained to accurately predict the class or category of an object (y), based on input variables (x). For example, identifying spam emails based on a combination of subject line, sender email address, body content, IP address, etc.
Just like regression analysis, classification is a supervised learning approach, in which we use a "ground truth" or prior knowledge of what the output would be, like we demonstrated with the previously identified, red stop sign images.
Learner Types and Algorithms
Classification which there are two types of learners—lazy learners and eager learners.
Classification also involves a large number of classification algorithms, including Decision Tree, Naive Bayes, Artifical Neural Networks, and K-Nearest Neighbor. And classification methods such as cross-validation, holdout, precision and recall, and the Receiver Operating Characteristics (ROC) curve.
|Lazy Learner||Eager Learner|
|Stores the data set without learning from it||Starts classifying (learning) as soon as it receives the data set|
|Waits for test data to begin learning||Does not wait for test data to begin learning|
|Spends less time learning and more time classifying data||Spends more time learning and less time classifying data|
|Higher and more improved prediction accuracy||Suboptimal prediction accuracy from using a single model to minimize average error over the entire dataset|
|K-Nearest Neighbor, Case-based Reasoning, Locally Weighted Regression||Naive Bayes, Decision Tree, Artificial Neural Networks|
Clustering, density estimation, and representational learning are examples of unsupervised learning approaches, where we aim to identify the structure of our data without explicit or previously established labels. Because there are no predefined labels in the training data, most unsupervised training models don't come with a specific method for comparing model performance. However, it is an excellent approach for automatically identifying data structure where it would be impractical or impossible for a human to do so.
We hope you found this introduction to the fundamental components and applications of machine learning useful. Future articles will cover the main topics covered here in detail. If you'd like to get started with some popular machine learning frameworks check out our Machine Learning with Python, Numpy, and Scipy. Or Machine Learning with Python and Scikit-Learn.