How Box Plot Works


How Box Plot Works

A Box Plot (or Box-and-Whisker Plot) is used to summarize the distribution of numerical data and detect outliers.

Components of a Box Plot

ComponentMeaning
MinimumSmallest value (excluding outliers)
Q1 (First Quartile)25% of the data lies below this value
Median (Q2)Middle value (50th percentile)
Q3 (Third Quartile)75% of the data lies below this value
MaximumLargest value (excluding outliers)
OutliersUnusually high or low values shown as separate points

Example

Dataset:

10, 12, 15, 18, 20, 22, 25, 28, 30, 60
StatisticValue
Minimum10
Q115
Median21
Q328
Maximum30
Outlier60

How it works

  1. Sort the data.
  2. Find the median.
  3. Find Q1 and Q3.
  4. Draw a box from Q1 to Q3.
  5. Draw a line inside the box for the median.
  6. Extend “whiskers” to the minimum and maximum non-outlier values.
  7. Plot any outliers as separate points.

Uses:

  • Detecting outliers
  • Comparing distributions
  • Understanding spread and central tendency

How Heatmap Works

A Heatmap displays values using colors. Larger or stronger values are shown with one color intensity, while smaller values are shown with another.

Example: Correlation Matrix

VariableAgeSalaryExperience
Age1.000.720.81
Salary0.721.000.89
Experience0.810.891.00

In a heatmap:

  • Dark/strong color → High value or strong correlation
  • Light color → Low value or weak correlation

How it works

  1. Arrange data into a matrix (rows and columns).
  2. Each cell contains a numerical value.
  3. Seaborn maps each value to a color based on a color scale.
  4. Similar values appear with similar colors, making patterns easy to spot.

Real-life uses

FieldExample
Data ScienceFeature correlation analysis
FinanceStock correlation matrix
EducationStudent performance analysis
HealthcareDisease occurrence by region
BusinessSales performance across products and months

Difference between Box Plot and Heatmap

Box PlotHeatmap
Shows distribution of one numerical variableShows values in a matrix using colors
Detects outliersDetects patterns and correlations
Uses quartiles and medianUses color intensity
Best for comparing distributionsBest for comparing many variables simultaneously

yatin.sharma@mhtechin.com Avatar

Leave a Reply

Your email address will not be published. Required fields are marked *