Box Plot:
Box plot is a graph which represent insights in five sectors of summary. It is used to indicate the distributions and crucial observations in the data set. In this post, we are going to see brief content on Box plot.
Representation:
- Minimum(smallest number)
- First quartile: middle value between smallest number and median
- Median : average value of the respective metrics
- Third quartile: middle value between median and highest number
- Maximum(highest number)
Sample Code:
Consider the below dataframe which contains number of units sold by A and B in different months,
Jan | Feb | Mar | Apr | May | |
A | 10 | 20 | 15 | 25 | 5 |
B | 3 | 50 | 8 | 19 | 31 |
import pandas as pd
import matplotlib.pyplot as plt
df1=pd.DataFrame([[10,20,15,25,5],[3,50,8,19,31]], index=['A','B'], columns=['Jan','Feb','Mar','Apr','May'])
plt.boxplot(df1)
Why Box plot?
- Provides better patterns when more than two data sets are compared
- Displays the variability and shape of the distribution
- Shows symmetry and outliers of data
- Indicates how the data is skewed and grouped.