Pivot Tables

Pivot tables are very much similar to what we experienced in spreadsheets. The difference between pivot tables and GroupBy function: โ€œPivot table is essentially a multi-dimensional version of GroupBy aggregation." โ€” that is, you split-apply-combine, but both the split and the combine happen across not a one-dimensional index, but across a two-dimensional grid.

import numpy as np
import pandas as pd 
import seaborn as sns

๐Ÿ›ณ Titanic dataset for demonstration

# importing dataset for demonstration
titanic = pd.read_csv('data/titanic.csv')
titanic.head()
survived
pclass
sex
age
fare
embarked
who
embark_town
alive
alone

0

0

3

male

22.0

7.2500

S

man

Southampton

no

False

1

1

1

female

38.0

71.2833

C

woman

Cherbourg

yes

False

2

1

3

female

26.0

7.9250

S

woman

Southampton

yes

True

3

1

1

female

35.0

53.1000

S

woman

Southampton

yes

False

4

0

3

male

35.0

8.0500

S

man

Southampton

no

True

1. WHAT IF WE USE GROUPBY

1.1. Finding survival rate by Gender

Essentially:

  • group(split) by sex,

  • select survived, and,

  • apply mean

1.2. Finding survival rate by Gender and Class

Essentially;

  • group(split) by sex & pclass,

  • select survived column, and,

  • apply mean aggregate

**Conclusion: ** Though we can apply two-dimensional Groupby but the code will start to look long-to-read and understand. Pandas have better tool, pivot_table, to deal with this.

2. USING PIVOT TABLE

The above two-dimensional GroupBy result can be easily derived from following pivot_table code. We will use .pivot_table() constructor, whose default aggfunc is np.mean

We can also get same result without mentioning the index and column kwargs

2.1. Multilevel Pivot Table

Let suppose, we want to group by age, sex and get the survived mean value by each pclass. But instead of a using each age value as separate group, we will make age_groups. To do this, we will first use pd.cut function to make the segment for age column. To make age segments, first let see min and max age in our dataset:

Lets make two age group: 0-18 and 18-80

Now, we will apply pivot_table on sex and age (through newly created age_group) Other variables will stay the same โ€” finding survived mean value for each pclass

2.2. Additional Pivot Table Options

a. Parameters of pivot_table

Paramter
Default

values=

None

index=

None

aggfunc=

โ€˜meanโ€™

margins=

False

dropna=

True

margins_name=

โ€˜allโ€™

b. aggfunc

Let suppose, we want to know the sum of survived and mean of fare columns, in each pclass

c. margins =True

This simple property margins=True computes sum along each column and row

Overall, approx. 38% people on board survived

3. CONCEPTS IN PRACTICE: BIRTHRATE DATA

  • First, load the dataset using Pandas read_csv function

  • Then we view the head of the dataset, .head() to get initial sense of dataset

  • To find total rows and columns in the dataset, we will use .shape method

1๏ธโƒฃ Finding sum of births in each month, across each gender

Plotting the results

2๏ธโƒฃ Finding sum of births in each decade, across each gender

Letโ€™s put this table into figure

Last updated

Was this helpful?