What is Linear Regression?
Linear Regression is a one type of regression technique that determines the linear relationship between a dependent variable and one or more independent variables. It can be classified into simple(one independent variable) and multiple linear Regression(two or more independent variables). Let us explore in details that how to build a linear model using python and its mathematical functions.
Fundamentals of Linear Regression model:
- Linear model is built by computing weighted sum of input features and a intercept term(constant) as in below equation.
- The vectorized form of linear regression model is as follows.
- To train the model to best fit, we have to find the model parameter that minimizes the Root Mean Square Error(RMSE). In General it is easier to minimize the Mean Square Error(MSE), potentially this result would satisfy RMSE too.
- Normal Equation should be used to find the value for model parameter that minimizes the MSE cost function.
Linear Regression using Python:
Consider a dataset of a cricket team that successfully chased its opponents in last six matches. The dataset contains total number of runs chased from number of balls.
Match No. | No. of overs played | Chased Targets |
1 | 15.4 | 125 |
2 | 19.3 | 189 |
3 | 16.5 | 167 |
4 | 12.2 | 84 |
5 | 14.3 | 133 |
6 | 19.1 | 203 |
The number of overs played and number of runs chased are represented in x and y-axis respectively as in below graph.
Using Normal Equation:
import numpy as np
import matplotlib.pyplot as plt
data=[[15.4,125],[19.3,189],[16.5,167],[12.2,84],[14.3,133],[19.1,203]]
runs=[x[0] for x in data]
balls=[y[1] for y in data]
runs_i=np.c_[np.ones((6,1)),runs]
theta_cap=np.linalg.inv(runs_i.T.dot(runs_i)).dot(runs_i.T).dot(balls) #Normal Equation
runs_limits=np.array([min(runs),max(runs)])
runs_limits_i=np.c_[np.ones((2,1)),runs_limits]
balls_predict=runs_limits_i.dot(theta_cap)
plt.plot(runs_limits,balls_predict,'blue')
plt.scatter(runs,balls,color='red')
plt.xlabel('No. of overs played')
plt.ylabel('No. of runs chased')
plt.show()
Using Library Function:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
data=[[15.4,125],[19.3,189],[16.5,167],[12.2,84],[14.3,133],[19.1,203]]
runs=[x[0] for x in data]
balls=[y[1] for y in data]
runs_i=np.c_[np.ones((6,1)),runs]
linear_reg=LinearRegression() #library function
linear_reg.fit(runs_i,balls)
plt.scatter(runs, balls, color='red')
plt.plot(runs, linear_reg.predict(runs_i), color='blue')
plt.xlabel('No. of overs played')
plt.ylabel('No. of runs chased')
plt.show()
Predicted output:
Pros and Cons:
- Complexity is less compared to other regression models.
- Its hard to find a real life scenario with linear relationships.