Understand the Box Plot

Box Plot:

Box plot is a graph which represent insights in five sectors of summary. It is used to indicate the distributions and crucial observations in the data set. In this post, we are going to see brief content on Box plot.

Representation:

  • Minimum(smallest number)
  • First quartile: middle value between smallest number and median
  • Median : average value of the respective metrics
  • Third quartile: middle value between median and highest number
  • Maximum(highest number)

Sample Code:

Consider the below dataframe which contains number of units sold by A and B in different months,

JanFebMarAprMay
A102015255
B35081931
import pandas as pd
import matplotlib.pyplot as plt
df1=pd.DataFrame([[10,20,15,25,5],[3,50,8,19,31]], index=['A','B'], columns=['Jan','Feb','Mar','Apr','May'])
plt.boxplot(df1)

Why Box plot?

  • Provides better patterns when more than two data sets are compared
  • Displays the variability and shape of the distribution
  • Shows symmetry and outliers of data
  • Indicates how the data is skewed and grouped.

Visualization gives you answers to questions you didn’t know you had.

Ben Schneiderman

Leave a Reply

Your email address will not be published. Required fields are marked *