All you need to learn first is Python programming language to code for Machine Learning programs easily and without hindrance. Let's go through some basics of ML :-  

Data Preprocessing

Step 1:  Importing the required libraries.

Step 2:  Importing the dataset(Datasets are generally available in .csv format).

Step 3:  Handling the missing data.

Step 4:  Encoding categorical data.

Step 5:  Splitting the dataset into test set and training set.

Step 6:  Feature scaling.

Step 7:  Build your model.

Detailed info at Instagram page

Normalization/Feature Scaling

Normalization or feature scaling is the required approach. It is a process of cleaning the entire data to make it more suitable/reliable for the fluent predictions. Scaling is that approach that leads the machine learning models towards the optimum predictions. It comes into action when data carries distant values, which means they have values that are found to have huge intervals. Learn deeply in our blog.


Seaborn Library

Seaborn Library, used in Machine Learning algorithms to visualize data on another level. It makes our understanding much easier for numerical or categorical attributes. Various functions are present in it to visualize. 

Let's experience each function deeply and understand them better.


Linear Regression 

It is a Machine Learning algorithm which comes under supervised learning. It is that process of predicting a continuous value, used to find the linearity between one or more than one predictor and target.



1.Simple Linear Regression.

  •  Basic code (Scratch)

  •  Code (Sklearn)

  •  Code (Cost function &

     Gradient Descent)

Tip: Codes will not run on compiler as data is not inbuilt you need to PASTE on your platform.

Tip: Code for cost function and gradient descent is a little difficult, try to understand the concept below.



2. Multiple Linear Regression

  • Basic code (Scratch)

  • Code (Sklearn)

  • Code (Cost function & Gradient Descent)


Linear Regression is generally an application of Ordinary Least Square(OLS). There are some additional models also which are one step ahead of Linear Regression. The explanation is done in the code itself.

Tip: Codes will not run on compiler as data is not inbuilt you need to PASTE on your platform.

Logistic Regression 

It is a machine learning algorithm that is used for classification problems. It follows a sigmoid path due to its function which can be written 1/(1+e^(-hypothesis)).

What we going to do here is, creating a model using a logistic regression algorithm that just classifies the data between the event i.e. it is happening or not. Hence we define a threshold value that predicts the plot under 0 and 1 as 0.5 being threshold value. It is based on the concept of probability and does predictive analysis.


K-nearest neighbor

  • Basic code (Scratch)

K-Nearest Neighbor algorithm is a quite simple yet most used classification algorithm. It can be used for the regression algorithm. KNN is non-parametric (means that it does not lead to any consideration on the focused data distribution), instance-based (means that the algorithm built, does not explicitly memorize itself. Instead of this, it chooses to learn the training instances.), and used within a supervised learning setting.


  • Store the training data samples within an array of considered data-points(arr). This means each and every element of this array represents a tuple as (x,y).                          Hint: Use zip()

  • For i=0 to m:   where m is the length of the training data                                                                         calculate the Euclidean distance by its formula

  • Make a set of K(number of nearest neighbors) shortest distances we get. Each of the distances simultaneously corresponds to the already verified data-points.

  • Return the class which is found to be in the majority.


  • Code (Sklearn)

Tip: Codes will not run on compiler as data is not inbuilt you need to PASTE on your platform.

Tip: Accuracy for KNN can be found from different ways, try to understand the accuracy methods below.




K-means clustering

We have taken datasets of items, with some features, and their corresponding values. The task is to categorize them into groups. To achieve this categorization, we will have to use the K-means algorithm which is an unsupervised learning algorithm. The aim of the algorithm is to categorize the following items into various K groups having similarities. To calculate that similarity we will use the easy-going Euclidean distance.


  • Initialize K points called as clusters, randomly.

  • Categorize each item to its nearest mean value, and then accordingly update the mean coordinates, which are the average values of items that are categorized in that particular mean so far.

  • Repeat this process for a given number of iteration and finally, in the end, clusters will be formed.

  • Basic code (Scratch)

  • Code (Sklearn)


Support Vector Machine(SVM)

SVM is a supervised learning algorithm that classifies cases by finding a suitable separator(HYPERPLANE). As per its name suggests support vectors are responsible for its conduct. It pretends to be a discriminative classifier that is formally designed by a separative hyperplane. It is a representation of examples as a point in space that is mapped so that the points of different categories are separated by a gap which must be as wide as possible. Learn deeply with code in our blog.


Decision Tree

Decision Trees are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. For detailed knowledge of decision tree and its coding read blog.


Hey!!, Have you practiced all codes?

If yes, then you know you are now ready to work on multiple projects,

now you can explore more.