# Indexing Pandas Series And Dataframe

Techniques learned in [Numpy](https://tahamaddam.com/numpy/?order=asc) like [indexing, slicing](https://tahamaddam.com/coding/numpy/indexing-and-slicing-a-numpy-array/), [fancy indexing](https://tahamaddam.com/coding/numpy/numpy-fancy-indexing/), [boolean masking and combination](https://tahamaddam.com/coding/numpy/boolean-masking-in-numpy/) - will be applied to Pandas `Series` and `DataFrame` objects

## 1. DATA INDEXING & SELECTION *ON SERIES*

`Series` object acts in many ways like a one-dimensional NumPy array, and in many ways like a standard Python dictionary , we will see how.

### 1.1 Series as Dictionary

`Series` essentially maps a collection of `keys` to collection of `values`

```python
import numpy as np
import pandas as pd 

# making Data Series
data_series = pd.Series([1,2,3,4,5],
                       index=['a','b','c','d','e'])
data_series
```

```
a    1
b    2
c    3
d    4
e    5
dtype: int64
```

* We can use dictionary like Python expressions

```python
'a' in data_series
```

```
True
```

* We can fetch index of `Series` object using `.keys()` method

```python
data_series.keys()
```

```
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
```

* We can fetch `index,value` pair using `.items()` method

```python
list(data_series.items())
```

```
[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)]
```

* Just like Python Dictionary, we can append Panda Series with index and its value

```python
data_series['f'] = 6
data_series
```

```
a    1
b    2
c    3
d    4
e    5
f    6
dtype: int64
```

### 1.2. Series as one-dimensional array

We can perform same operations on `Series` object as we do on Numpy Arrays — indexing, slicing, masking, fancy indexing

* **Indexing** by providing explicit index (string, in our case)

```python
data_series['d']
```

```
4
```

* **Slicing** with string as index **ALERT**: Notice that when you are slicing with an explicit index (i.e., `data[:'d'])`, the *stop index is included in the slice*

```python
data_series[:'d']
```

```
a    1
b    2
c    3
d    4
dtype: int64
```

* **Indexing** by providing implicit (integer) index

```python
data_series[0]
```

```
1
```

* **Slicing** by providing implicit (integer) index. **ALERT** , note that *stop index isn’t included in the output*

```python
data_series[1:3]
```

```
b    2
c    3
dtype: int64
```

### 1.3. Masking & Fancy Indexing

* In **masking**, we provide the boolean array under `[]` to get subset of `Series` This boolean array can be the result of some conditional operator. For masking, we can pass single condition or group of conditions. We will examine all this concepts in the examples below:

```python
# conditional operator that result in boolean array
data_series > 3 
```

```
a    False
b    False
c    False
d     True
e     True
f     True
dtype: bool
```

```python
# boolean masking
data_series[(data_series > 3)]
```

```
d    4
e    5
f    6
dtype: int64
```

```python
# another masking example with multiple conditions 
data_series[(data_series > 0) & (data_series <4)]
```

```
a    1
b    2
c    3
dtype: int64
```

* **Fancy Indexing** is where we need to fetch values at arbitrary index points, as compared to simple slicing where we fetch values in some order (`[1:10]`, `[::2]`, for example)

```python
# fetch first and last item of the Series
data_series[[0,-1]]
```

```
a    1
f    6
dtype: int64
```

```python
# fetch index values of 'a' and 'e' indices
data_series[['a','e']]
```

```
a    1
e    5
dtype: int64
```

### 1.4. Indexers: loc, iloc

PROBLEM:

* We have seen above in the example of slicing that how *explicit indexing* makes things confusing, this is specially true if the indices are in integer.
* For example, if your Series has an explicit integer index, an indexing operation such as `data[1]` will use the explicit indexing, that is fetch the value of index labeled `1` and not the second item as in the implicit indexing. However, slicing operation like `data[1:3]` will use the implicit Python-style slicing, that is, fetching 2nd and 3rd items in the Series object

SOLUTION:

* Because of this potential confusion in the case of integer indexes, Pandas provides some special indexer attributes that explicitly expose certain indexing schemes:

```python
# first make pd.Series where confusion can happen
pd_series = pd.Series([10,20,30,40,50],
                     index=[1,2,3,4,5])
pd_series
```

```
1    10
2    20
3    30
4    40
5    50
dtype: int64
```

```python
# Now let suppose you want to get the value of second index[1]
# but [1] will assume it as explicit index, 
# and gives us first item
pd_series[1]
```

```
10
```

#### a. Using loc

`.loc()` always reference the *explicit index* scheme

```python
pd_series.loc[1]
```

```
10
```

#### b. Using iloc

`.iloc()` always reference the *implicit index* scheme

```python
pd_series.iloc[1]
```

```
20
```

## 2. DATA INDEXING & SELECTION *IN A DATAFRAME*

`DataFrame` object acts in many ways like a two-dimensional NumPy array, and in many ways like a dictionary of related `Series` objects, we will see how:

### 2.1. DataFrame as a Dictionary

`DataFrame` as a dictionary of related Series objects

```python
# reproducing the data series we constructed earlier
# reproducing population dictionary
population_dict = {'California': 38332521, 
                   'Texas': 26448193, 
                   'New York': 19651127, 
                   'Florida': 19552860, 
                   'Illinois': 12882135}
population_series = pd.Series(population_dict)

# making the area dictionary 
area_dict = {'California': 423967, 
             'Texas': 695662, 
             'New York': 141297, 
             'Florida': 170312, 
             'Illinois': 149995}
area_series = pd.Series(area_dict)

states_dataframe = pd.DataFrame({'population': population_series,
                                'area': area_series})
states_dataframe
```

|            | population | area   |
| ---------- | ---------- | ------ |
| California | 38332521   | 423967 |
| Texas      | 26448193   | 695662 |
| New York   | 19651127   | 141297 |
| Florida    | 19552860   | 170312 |
| Illinois   | 12882135   | 149995 |

* Individual **column** data can be accesses via dictionary style indexing

```python
states_dataframe['population']
```

```
California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
Name: population, dtype: int64
```

* We can also access the column values through the **column name as attribute**

```python
states_dataframe.population
```

```
California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
Name: population, dtype: int64
```

* **Dictionary-style syntax** can be used to modify the object or add new column to `DataFrame` object

```python
states_dataframe['density'] = states_dataframe['population'] / states_dataframe['area']
states_dataframe
```

|            | population | area   | density    |
| ---------- | ---------- | ------ | ---------- |
| California | 38332521   | 423967 | 90.413926  |
| Texas      | 26448193   | 695662 | 38.018740  |
| New York   | 19651127   | 141297 | 139.076746 |
| Florida    | 19552860   | 170312 | 114.806121 |
| Illinois   | 12882135   | 149995 | 85.883763  |

### 2.2. DataFrame as two-dimensional Array

* `.values` method provides underlying values of `DataFrame` object

```python
states_dataframe.values
```

```
array([[38332521,   423967],
       [26448193,   695662],
       [19651127,   141297],
       [19552860,   170312],
       [12882135,   149995]])
```

* `.T` method transposes (columns to rows, rows to columns) the `DataFrame` object

```python
states_dataframe.T
```

|            | California | Texas    | New York | Florida  | Illinois |
| ---------- | ---------- | -------- | -------- | -------- | -------- |
| population | 38332521   | 26448193 | 19651127 | 19552860 | 12882135 |
| area       | 423967     | 695662   | 141297   | 170312   | 149995   |

#### a. Accessing row

```python
states_dataframe.values[0]
```

```
array([38332521,   423967])
```

#### b. Accessing column

💡 Remember that `[]` indexing applies to column labels in `DataFrame` object as opposed to row labels in `Series` object

```python
states_dataframe['population']
```

```
California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
Name: population, dtype: int64
```

### 2.3. Using Indexers: loc, iloc

#### a. Using loc

`.loc()` always reference the *explicit index* scheme

```python
states_dataframe.loc['New York']
```

```
population    19651127
area            141297
Name: New York, dtype: int64
```

```python
states_dataframe.loc[:'New York']
```

|            | population | area   |
| ---------- | ---------- | ------ |
| California | 38332521   | 423967 |
| Texas      | 26448193   | 695662 |
| New York   | 19651127   | 141297 |

```python
# selection on both rows and columns
states_dataframe.loc[:'New York',:'area']
```

|            | population | area   |
| ---------- | ---------- | ------ |
| California | 38332521   | 423967 |
| Texas      | 26448193   | 695662 |
| New York   | 19651127   | 141297 |

#### b. Using iloc

`.iloc()` always reference the *implicit index* scheme

```python
states_dataframe.iloc[2]
```

```
population    19651127
area            141297
Name: New York, dtype: int64
```

```python
states_dataframe.iloc[:3]
```

|            | population | area   |
| ---------- | ---------- | ------ |
| California | 38332521   | 423967 |
| Texas      | 26448193   | 695662 |
| New York   | 19651127   | 141297 |

```python
states_dataframe.iloc[:3,:1]
```

|            | population |
| ---------- | ---------- |
| California | 38332521   |
| Texas      | 26448193   |
| New York   | 19651127   |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://codingnotes.gitbook.io/coding_notes/coding/pandas/indexing-pandas-series-and-dataframe.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
