Pivot Tables
Pivot tables are very much similar to what we experienced in spreadsheets. The difference between pivot tables and GroupBy
function: βPivot table is essentially a multi-dimensional version of GroupBy aggregation." β that is, you split-apply-combine, but both the split and the combine happen across not a one-dimensional index, but across a two-dimensional grid.
π³ Titanic dataset for demonstration
survived | pclass | sex | age | fare | embarked | who | embark_town | alive | alone | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | male | 22.0 | 7.2500 | S | man | Southampton | no | False |
1 | 1 | 1 | female | 38.0 | 71.2833 | C | woman | Cherbourg | yes | False |
2 | 1 | 3 | female | 26.0 | 7.9250 | S | woman | Southampton | yes | True |
3 | 1 | 1 | female | 35.0 | 53.1000 | S | woman | Southampton | yes | False |
4 | 0 | 3 | male | 35.0 | 8.0500 | S | man | Southampton | no | True |
1. WHAT IF WE USE GROUPBY
1.1. Finding survival rate by Gender
Essentially:
group(split) by
sex
,select
survived
, and,apply
mean
1.2. Finding survival rate by Gender and Class
Essentially;
group(split) by
sex
&pclass
,select
survived
column, and,apply
mean
aggregate
**Conclusion: ** Though we can apply two-dimensional Groupby but the code will start to look long-to-read and understand. Pandas have better tool, pivot_table
, to deal with this.
2. USING PIVOT TABLE
The above two-dimensional GroupBy result can be easily derived from following pivot_table
code. We will use .pivot_table()
constructor, whose default aggfunc
is np.mean
We can also get same result without mentioning the index
and column
kwargs
2.1. Multilevel Pivot Table
Let suppose, we want to group by age
, sex
and get the survived
mean
value by each pclass
. But instead of a using each age value as separate group, we will make age_groups
. To do this, we will first use pd.cut
function to make the segment for age
column. To make age segments, first let see min
and max
age
in our dataset:
Lets make two age group: 0-18
and 18-80
Now, we will apply pivot_table
on sex
and age
(through newly created age_group
) Other variables will stay the same β finding survived
mean
value for each pclass
2.2. Additional Pivot Table Options
a. Parameters of pivot_table
pivot_table
Paramter | Default |
---|---|
values= | None |
index= | None |
aggfunc= | βmeanβ |
margins= | False |
dropna= | True |
margins_name= | βallβ |
b. aggfunc
Let suppose, we want to know the sum
of survived
and mean
of fare
columns, in each pclass
c. margins =True
This simple property margins=True
computes sum along each column and row
Overall, approx. 38% people on board survived
3. CONCEPTS IN PRACTICE: BIRTHRATE DATA
First, load the dataset using Pandas
read_csv
functionThen we view the head of the dataset,
.head()
to get initial sense of datasetTo find total rows and columns in the dataset, we will use
.shape
method
1οΈβ£ Finding sum
of births
in each month
, across each gender
Plotting the results
2οΈβ£ Finding sum
of births
in each decade
, across each gender
Letβs put this table into figure
Last updated