> For the complete documentation index, see [llms.txt](https://codingnotes.gitbook.io/coding_notes/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://codingnotes.gitbook.io/coding_notes/coding/pandas/pivot-tables.md).

# Pivot Tables

Pivot tables are very much similar to what we experienced in spreadsheets. The difference between pivot tables and `GroupBy` function: “Pivot table is essentially a multi-dimensional version of GroupBy aggregation." — that is, you split-apply-combine, but both the split and the combine happen across not a one-dimensional index, but across a two-dimensional grid.

```python
import numpy as np
import pandas as pd 
import seaborn as sns
```

🛳 Titanic dataset for demonstration

```python
# importing dataset for demonstration
titanic = pd.read_csv('data/titanic.csv')
titanic.head()
```

|   | survived | pclass | sex    | age  | fare    | embarked | who   | embark\_town | alive | alone |
| - | -------- | ------ | ------ | ---- | ------- | -------- | ----- | ------------ | ----- | ----- |
| 0 | 0        | 3      | male   | 22.0 | 7.2500  | S        | man   | Southampton  | no    | False |
| 1 | 1        | 1      | female | 38.0 | 71.2833 | C        | woman | Cherbourg    | yes   | False |
| 2 | 1        | 3      | female | 26.0 | 7.9250  | S        | woman | Southampton  | yes   | True  |
| 3 | 1        | 1      | female | 35.0 | 53.1000 | S        | woman | Southampton  | yes   | False |
| 4 | 0        | 3      | male   | 35.0 | 8.0500  | S        | man   | Southampton  | no    | True  |

## 1. WHAT IF WE USE GROUPBY

### 1.1. Finding survival rate by Gender

Essentially:

* ***group(split)*** by `sex`,
* ***select*** `survived`, and,
* ***apply*** `mean`

```python
titanic.groupby('sex')['survived'].mean()
```

```
sex
female    0.742038
male      0.188908
Name: survived, dtype: float64
```

### 1.2. Finding survival rate by Gender and Class

Essentially;

* ***group(split)*** by `sex` & `pclass`,
* ***select*** `survived` column, and,
* ***apply*** `mean` aggregate

```python
titanic.groupby(['sex','pclass'])['survived'].mean()
```

```
sex     pclass
female  1         0.968085
        2         0.921053
        3         0.500000
male    1         0.368852
        2         0.157407
        3         0.135447
Name: survived, dtype: float64
```

```python
# unstack the result for better presentation
titanic.groupby(['sex','pclass'])['survived'].mean().unstack()
```

```
pclass         1         2         3
sex                                 
female  0.968085  0.921053  0.500000
male    0.368852  0.157407  0.135447
```

\*\*Conclusion: \*\* Though we can apply two-dimensional Groupby but the code will start to look long-to-read and understand. Pandas have better tool, `pivot_table`, to deal with this.

## 2. USING PIVOT TABLE

The above two-dimensional GroupBy result can be easily derived from following `pivot_table` code. We will use `.pivot_table()` constructor, whose default `aggfunc` is `np.mean`

```python
titanic.pivot_table('survived', index='sex', columns='pclass')
```

```
pclass         1         2         3
sex                                 
female  0.968085  0.921053  0.500000
male    0.368852  0.157407  0.135447
```

We can also get same result without mentioning the `index` and `column` kwargs

```python
titanic.pivot_table('survived', 'sex', 'pclass')
```

```
pclass         1         2         3
sex                                 
female  0.968085  0.921053  0.500000
male    0.368852  0.157407  0.135447
```

### 2.1. Multilevel Pivot Table

Let suppose, we want to group by `age`, `sex` and get the `survived` `mean` value by each `pclass`. But instead of a using each age value as separate group, we will make `age_groups`. To do this, we will first use `pd.cut` function to make the segment for `age` column. To make age segments, first let see `min` and `max` `age` in our dataset:

```python
print(f"Min Age: {titanic['age'].min()}")
print(f"Max Age: {titanic['age'].max()}")
```

```
Min Age: 0.42
Max Age: 80.0
```

Lets make two age group: `0-18` and `18-80`

```python
age_group = pd.cut(titanic['age'], [0,18,80])
age_group.head()
```

```
0    (18, 80]
1    (18, 80]
2    (18, 80]
3    (18, 80]
4    (18, 80]
Name: age, dtype: category
Categories (2, interval[int64]): [(0, 18] < (18, 80]]
```

Now, we will apply `pivot_table` on `sex` and `age` (through newly created `age_group`) Other variables will stay the same — finding `survived` `mean` value for each `pclass`

```python
titanic.pivot_table('survived', index=['sex',age_group], columns='pclass')
```

```
pclass                  1         2         3
sex    age                                   
female (0, 18]   0.909091  1.000000  0.511628
       (18, 80]  0.972973  0.900000  0.423729
male   (0, 18]   0.800000  0.600000  0.215686
       (18, 80]  0.375000  0.071429  0.133663
```

### 2.2. Additional Pivot Table Options

#### a. Parameters of `pivot_table`

| Paramter       | Default |
| -------------- | ------- |
| values=        | None    |
| index=         | None    |
| aggfunc=       | ‘mean’  |
| margins=       | False   |
| dropna=        | True    |
| margins\_name= | ‘all’   |

#### b. aggfunc

Let suppose, we want to know the `sum` of `survived` and `mean` of `fare` columns, in each `pclass`

```python
titanic.pivot_table(index='sex',columns='pclass', aggfunc={'survived': sum, 'fare': 'mean'})
# omitted the values keyword; 
# when you’re specifying a mapping for aggfunc, this is determined automatically.
```

```
                          fare                   survived        
pclass           1          2          3        1   2   3
sex                                                      
female  106.125798  21.970121  16.118810       91  70  72
male     67.226127  19.741782  12.661633       45  17  47
```

#### c. margins =True

This simple property `margins=True` computes sum along each column and row

```python
titanic.pivot_table('survived', index='sex', columns='pclass', margins=True)
```

```
pclass         1         2         3       All
sex                                           
female  0.968085  0.921053  0.500000  0.742038
male    0.368852  0.157407  0.135447  0.188908
All     0.629630  0.472826  0.242363  0.383838
```

Overall, approx. 38% people on board survived

## 3. CONCEPTS IN PRACTICE: BIRTHRATE DATA

* First, load the dataset using Pandas `read_csv` function
* Then we view the head of the dataset, `.head()` to get initial sense of dataset
* To find total rows and columns in the dataset, we will use `.shape` method

```python
births = pd.read_csv('data/births.csv')
print(births.head())
print(births.shape)
```

```
   year  month  day gender  births
0  1969      1  1.0      F    4046
1  1969      1  1.0      M    4440
2  1969      1  2.0      F    4454
3  1969      1  2.0      M    4548
4  1969      1  3.0      F    4548
(15547, 5)
```

1️⃣ Finding `sum` of `births` in each `month`, across each `gender`

```python
births.pivot_table('births', index='month', columns='gender', aggfunc='sum', margins=True)
```

```
gender         F         M        All
month                                
1        6035447   6328750   12364197
2        5634064   5907114   11541178
3        6181613   6497231   12678844
4        5889345   6196546   12085891
5        6145186   6479786   12624972
6        6093026   6428044   12521070
7        6512299   6855257   13367556
8        6600723   6927284   13528007
9        6473029   6779802   13252831
10       6330549   6624401   12954950
11       5956388   6241579   12197967
12       6184154   6472761   12656915
All     74035823  77738555  151774378
```

Plotting the results

```python
# using matplotlib to draw figure of 
# sum of births in each month, across each gender
# magic function (%matplotlib) to make the plot appear and store in notebook
%matplotlib inline

import matplotlib.pyplot as plt
sns.set() # set seaborn styles
births.pivot_table('births', index='month', columns='gender', aggfunc='sum').plot()
plt.ylabel('total births in each month');
```

![](/files/YJxCvNWpOdFXKjcJHFl2)

2️⃣ Finding `sum` of `births` in each `decade`, across each `gender`

```python
# adding a decade column
births['decade'] = 10 * (births['year'] // 10 ) # //10 will remove the last digit in year 
```

```python
# creating pivot table for total births, in each decade, along each gender type
print(births.pivot_table('births', index='decade', columns='gender', aggfunc='sum', margins=True))
```

```
gender         F         M        All
decade                               
1960     1753634   1846572    3600206
1970    16263075  17121550   33384625
1980    18310351  19243452   37553803
1990    19479454  20420553   39900007
2000    18229309  19106428   37335737
All     74035823  77738555  151774378
```

Let’s put this table into figure

```python
# using matplotlib to draw figure of 
# sum of births in each decade, across each gender
# magic function (%matplotlib) to make the plot appear and store in notebook
%matplotlib inline

sns.set() # set seaborn styles
births.pivot_table('births', index='year', columns='gender', aggfunc='sum').plot()
plt.ylabel('total births per year');
```

![](/files/CGChJC3rbHy76MHvvlmy)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://codingnotes.gitbook.io/coding_notes/coding/pandas/pivot-tables.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
