Pivot Tables
Pivot tables are very much similar to what we experienced in spreadsheets. The difference between pivot tables and GroupBy function: โPivot table is essentially a multi-dimensional version of GroupBy aggregation." โ that is, you split-apply-combine, but both the split and the combine happen across not a one-dimensional index, but across a two-dimensional grid.
import numpy as np
import pandas as pd
import seaborn as sns๐ณ Titanic dataset for demonstration
# importing dataset for demonstration
titanic = pd.read_csv('data/titanic.csv')
titanic.head()0
0
3
male
22.0
7.2500
S
man
Southampton
no
False
1
1
1
female
38.0
71.2833
C
woman
Cherbourg
yes
False
2
1
3
female
26.0
7.9250
S
woman
Southampton
yes
True
3
1
1
female
35.0
53.1000
S
woman
Southampton
yes
False
4
0
3
male
35.0
8.0500
S
man
Southampton
no
True
1. WHAT IF WE USE GROUPBY
1.1. Finding survival rate by Gender
Essentially:
group(split) by
sex,select
survived, and,apply
mean
1.2. Finding survival rate by Gender and Class
Essentially;
group(split) by
sex&pclass,select
survivedcolumn, and,apply
meanaggregate
**Conclusion: ** Though we can apply two-dimensional Groupby but the code will start to look long-to-read and understand. Pandas have better tool, pivot_table, to deal with this.
2. USING PIVOT TABLE
The above two-dimensional GroupBy result can be easily derived from following pivot_table code. We will use .pivot_table() constructor, whose default aggfunc is np.mean
We can also get same result without mentioning the index and column kwargs
2.1. Multilevel Pivot Table
Let suppose, we want to group by age, sex and get the survived mean value by each pclass. But instead of a using each age value as separate group, we will make age_groups. To do this, we will first use pd.cut function to make the segment for age column. To make age segments, first let see min and max age in our dataset:
Lets make two age group: 0-18 and 18-80
Now, we will apply pivot_table on sex and age (through newly created age_group) Other variables will stay the same โ finding survived mean value for each pclass
2.2. Additional Pivot Table Options
a. Parameters of pivot_table
pivot_tablevalues=
None
index=
None
aggfunc=
โmeanโ
margins=
False
dropna=
True
margins_name=
โallโ
b. aggfunc
Let suppose, we want to know the sum of survived and mean of fare columns, in each pclass
c. margins =True
This simple property margins=True computes sum along each column and row
Overall, approx. 38% people on board survived
3. CONCEPTS IN PRACTICE: BIRTHRATE DATA
First, load the dataset using Pandas
read_csvfunctionThen we view the head of the dataset,
.head()to get initial sense of datasetTo find total rows and columns in the dataset, we will use
.shapemethod
1๏ธโฃ Finding sum of births in each month, across each gender
Plotting the results
2๏ธโฃ Finding sum of births in each decade, across each gender
Letโs put this table into figure
Last updated
Was this helpful?