Types of Machine Learning Algorithms

Near Learn
3 min readJan 12, 2022

Now that you understand the basics of machine learning, here are the various algorithms that are used in more detail.

supervised learning algorithms

linear regression

logistic regression

decision tree

random forest

Naive Bayes Classifier

Kashmir nearest neighbor

support vector machines

unsupervised learning algorithms

K-Means Clustering

reinforcement learning algorithms

Q-learning

linear regression

Linear regression is a method of predicting the dependent variable (Y) based on the values ​​of the independent variable (X). It can be used in cases where we want to predict continuous quantities.

The goal in linear regression is to use our independent variable x to predict the dependent variable y. In linear regression the relation is linear and not quadratic.

The formula for linear regression is $y = Bo + B1x + e$ which is the same as the slope of a line that is $y = mx+b$. b is the y-intercept, b1 is the slope with x remaining as the independent variable and e is the error, the mean distance between the data points and the line of best fit.

Essentially linear regression is essentially the process of determining the line of best fit from a list of data points plotted on a graph. Its applications are endless.

logistic regression

Next is logistic regression which is similar to linear regression.

Logistic regression is a method used to predict a dependent variable (y), given a set of independent variables (x), such that the dependent variable is categorical.

Essentially speaking, logistic regression can be used where the output will be only two categories and the only output will be 0 or 1 (depending on your goal or desired outcome, 0, and 1 can mean different things) ).

Compared to Linear Regression, Logistic Regression is different as it has only 2 categories. There is also the term sigmoid and essentially it is a term that will tell us how likely our outputs are to be either 0 or 1. Logistic regression does not calculate the output to be exactly 0, or 1 and is simply looking for the probability that something will fall into class 0, or class 1. The applications of logistic regression are endless and examples include binary classification such as determining what an object is (dog, or cat; apple or orange, blue or red), determining whether a tumor is malignant, and etc.

decision tree

A decision tree is a supervised machine learning algorithm that looks like an inverted tree, in which each node represents a predictor variable (feature), the link between nodes represents a decision and each leaf node represents an outcome ( response variable).

Decision trees can be thought of as a flow chart that breaks down into two sections for each step. In each step, there is an option of yes or no which can help us to understand the output/prediction to a greater extent. To do this we need to follow the information gain and entropy process.

Information gain and entropy process

Step 1: Choose the Best Attribute (A)

Step 2: Assign A as the decision variable for the root node.

Step 3: For each value of a, construct a node’s descendant

Step 4: Assign the Taxonomy Label to the Leaf Node

Step 5: If the data is classified correctly: Stop..

Step 6: Other: Iterate Over the Tree

You may well be thinking, “How do you know which variable best separates the data?” The answer is quite simple, because the variable with the highest information gain that best divides the data into the desired output classes is the most important variable.

Here are some quick definitions:

Entropy → Measures the inaccuracy or uncertainty present in the data

Informational Gain (IG) → IG indicates how much “information” a particular attribute/variable gives us about the final result

Calculate entropy and informational gain: thinking which node/variable has the highest predictability and impact on the final result

One problem that can occur with decision trees is what we call overfitting. This happens when our model simply remembers all the data points and makes predictions from them. In that case, your model is not learning and is just memorizing, which will not produce the best results. Our model is not going to perform well when it is tested. You must be thinking, well how do you solve this? This is where random forests come in.

--

--

Near Learn

NearLearn is an Ed-tech brand registered under the company NEAR AND LEARN PRIVATE LIMITED. Read More: https://nearlearn.com/