Case studiesDevOpsSoftware Engineering

Prerequisites for learning machine learning

Machine learning is based on


Statistics contain tools that can be used to get some outcome from the data. There is descriptive statistics which is used to transform raw data in some important information. Also, inferential statistics can be used to get important information from a sample of data instead of using complete dataset.

Linear Algebra

Linear algebra deals with vectors, matrices, and linear transformations. It is very important in machine learning as it can be used to transform and perform operations on the dataset.


Data set having multiple features are used to build machine learning models as features are multiple multivariable calculus plays an important role to build a machine learning model.


Probability helps predict the likelihood of the occurrences, It helps us to reason the situation may or may not happen again. For Machine learning probability is a foundation.

Programming language

Python or R

As both the languages contain multiple libraries to perform mathematical operations these languages are widely used in machine learning.

Basic programming skills (Python)Data Structures: String, List, Tuple, Set, DictionaryLibraries: Numpy, Pandas, Matplotlib

You will need to know a few operations like storing data, accessing (CSV)and manipulating data provided the data structures, using them efficiently by understanding their pros and cons.

Machine Learning Mathematics

Machine learning is all about creating an algorithm that can learn from data to make a prediction like what kinds of objects are there in the picture, or recommendation engine, the best combination of drugs to cure the certain disease or spam filtering.

Machine learning is built on mathematical prerequisites and if you know why maths is used in machine learning it will make it fun. You need to know the maths behind the functions you will be using and which model is suitable for the data and why.

So let’s start with an interesting problem of predicting house prices, having a dataset containing a history of different features and prices, for now, we will consider the area of the living space in square feet and the prices. Now we have a data set containing two columns as shown below.

There must be some correlation between these two variables to find out we will need to build a model which can predict the price of houses, how can we do it?

Let’s graph this data and see how it looks like:

here X-axis is the price per sqft of living space and the Y-axis is the price of the house if we plot all the data points we will get a scatter plot which can be represented by a line as shown in the figure above and if we input some data it will then predict some outcome, Ideally we have to find a line that will intersect maximum data points.

Here we are trying to create a line which is termed as

Y=mX + c

This is called as linear regression this allows to study and summarize a relation between two variables.

X = Independent variable

Y = Dependent variable

c = y-intercept

m = Slop of line

If we consider the equation we have values for X which is an independent variable but, if we have values for m and c, We can easily predict the value of Y.

How do we find these variables?

In order to find these variables, we can try a bunch of values and try to find out a line which intersects the maximum number of data points. But, how can we find the best fit line?

So in order to find the best-fit line, we can use least squares error function so if we try to check a bunch of numbers using this line we can now find error between the real value of y and the predicted value y`

We have to find the best fit line, in order to find this line we can use a mathematical function least-squares error which can be given as below.

Using this function we can find out the error in each real data point and predicted data point and sum all the errors and square them to find out our error.

If we add the third axis to our graph containing all possible error values and plot in 3-dimension it will look like below.

Here ideal values would be in the bottom black part which will predict price close to real data point but, how do we find the best possible values for m and c, here optimization technique from calculus called as gradient descent is used to find best possible value.

Gradient Descent

This will allow us to find minimum value iteratively which uses the error for the given data point to compute what is called as gradient of our unknown variable and we can use gradient to update our two variables then we will move on to the next data point and repeat the process over and over to find a minimum value where the error is minimum.

Linear algebra

Now if we think practically the prices of the apartment don’t really depend only on price per square feet, there are many factors such as the number of bedrooms, bathrooms, etc. If we consider those features as well then the equation will look something like this

​Y = b0 + b1x1 + b2x2 + …..+bnxn + c

This is multilinear regression this belongs to linear algebra, here we can use matrices of size mxn where m are features and n are data points.


Let’s consider probability and change our perspective towards solution to our problem, here instead of finding out the price we will try to predict the condition of the house, so we will be able to classify a house with the probability which being good condition or bad condition

Considering probability if we change our perspective towards a solution to our problem and instead of finding price if we try to predict the condition of the house we will be able to classify house with the probability being good or bad condition. For this to work we will have to use a technic called logistic regression which works on the probability of occurrences and that can be given by sigmoid function which goes between 0 to 100.


In this article, we covered the prerequisites of machine learning and how they are applied in machine learning. So basically, it consists of statistics, calculus, linear algebra, and probability theory. Calculus has techniques used for optimisation, linear algebra has algorithms which can work on huge data sets, with probability we can predict the likelihood of occurrences and statistics helps us infer something from the sample of data sets.


Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button