Linear Regression

What is Linear Regression?

Linear Regression is a one type of regression technique that determines the linear relationship between a dependent variable and one or more independent variables. It can be classified into simple(one independent variable) and multiple linear Regression(two or more independent variables). Let us explore in details that how to build a linear model using python and its mathematical functions.

Fundamentals of Linear Regression model:

  • Linear model is built by computing weighted sum of input features and a intercept term(constant) as in below equation.
  • The vectorized form of linear regression model is as follows.
  • To train the model to best fit, we have to find the model parameter that minimizes the Root Mean Square Error(RMSE). In General it is easier to minimize the Mean Square Error(MSE), potentially this result would satisfy RMSE too.
MSE cost function
  • Normal Equation should be used to find the value for model parameter that minimizes the MSE cost function.

Linear Regression using Python:

Consider a dataset of a cricket team that successfully chased its opponents in last six matches. The dataset contains total number of runs chased from number of balls.

Match No.No. of overs playedChased Targets
115.4125
219.3189
316.5167
412.284
514.3133
619.1203

The number of overs played and number of runs chased are represented in x and y-axis respectively as in below graph.

Using Normal Equation:

import numpy as np
import matplotlib.pyplot as plt
data=[[15.4,125],[19.3,189],[16.5,167],[12.2,84],[14.3,133],[19.1,203]]
runs=[x[0] for x in data]
balls=[y[1] for y in data]
runs_i=np.c_[np.ones((6,1)),runs] 
theta_cap=np.linalg.inv(runs_i.T.dot(runs_i)).dot(runs_i.T).dot(balls) #Normal Equation
runs_limits=np.array([min(runs),max(runs)])
runs_limits_i=np.c_[np.ones((2,1)),runs_limits]
balls_predict=runs_limits_i.dot(theta_cap)
plt.plot(runs_limits,balls_predict,'blue')
plt.scatter(runs,balls,color='red')
plt.xlabel('No. of overs played')
plt.ylabel('No. of runs chased')
plt.show()

Using Library Function:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
data=[[15.4,125],[19.3,189],[16.5,167],[12.2,84],[14.3,133],[19.1,203]]
runs=[x[0] for x in data]
balls=[y[1] for y in data]
runs_i=np.c_[np.ones((6,1)),runs]
linear_reg=LinearRegression() #library function
linear_reg.fit(runs_i,balls)
plt.scatter(runs, balls, color='red')
plt.plot(runs, linear_reg.predict(runs_i), color='blue')
plt.xlabel('No. of overs played')
plt.ylabel('No. of runs chased')
plt.show()

Predicted output:

Pros and Cons:

  • Complexity is less compared to other regression models.
  • Its hard to find a real life scenario with linear relationships.

All models are wrong, but some are useful.

– George E. P. Box

Leave a Reply

Your email address will not be published. Required fields are marked *