Logistic Regression 

It is a machine learning algorithm that is used for classification problems. It follows a sigmoid path due to its function which can be written

1/(1+e^(-hypothesis)).

What we going to do here is, creating a model using a logistic regression algorithm that just classifies the data between the event i.e. it is happening or not. Hence we define a threshold value that predicts the plot under 0 and 1 as 0.5 being threshold value. It is based on the concept of probability and does the predictive analysis.

Logistic Regression is a classification algorithm that is used for problems having different classes. In this type of algorithm, we have to predict the output in discrete value(binary outcome either 0 or 1). For instance, we can predict whether a person can vote or not in the upcoming elections.

Logistic Regression is quite similar to the Linear Regression but the only difference between both is the nature of the desired output. The Linear Regression predicts the continuous value whereas the Logistic value predicts the discrete value. Therefore, Logistic Regression can also be known as Generalized Linear Regression.

How does it work?

Logistic Regression simply measures the relationship between the labels(dependent variable) which we have to predict and the features(independent variables), by probably estimating the outcomes using the underlying logistic function.

Sigmoid Function

The Sigmoid Function is an activation function that is nothing but an S-shaped curve which takes a real-valued number and then map it after applying a particular activation function on it.  The sigmoid function maps the real values in between the range of 0 and 1 but never places the values exactly on those points. 

To Remember:
While designing a Machine Learning model with many features, it is to be considered that having too many features could potentially lead our model to predict the result with less accuracy, especially if certain features have no effects over the outcomes or may have a drastic effect on the variables.
There are some methods to select the most appropriate variables from the huge dataset.

  • Forward Selection

  • Backward Selection

  • Bi-directional Comparision

To start building a model with multiple features, the following steps can be taken into consideration:

Step 1: Data Preprocessing
It includes importing the libraries and datasets, checking for missing values, proper visualization of data.
The next task is to deal with categorical data. For this, encode the categorical data using LaberEncoder() and make the dummy variables if required.
Feature scaling should be taken care of to clean the data for better accuracy.
 
Step 2: Fitting the model with the training set.

After the data is cleaned and finalized, the features and target must be picked out from the finalized dataset, and then further it has to be split into training and testing dataset. The instantiated model object must be fitted with the training data using the fit() method of LogisticRegression. The LogisticRegression class includes both simple and multiple logistic regression algorithms.

Step 3: Predicting the output
To predict the outcome, we use the predict() method of class LogisticRegression on the regressor which has been fitted before.

Here Logistic Regression ends, now

let's get back to ML Codes page

  • CREATED BY ANMOL VARSHNEY & PALAK GUPTA