Indexing Pandas Series And Dataframe

Techniques learned in Numpyarrow-up-right like indexing, slicingarrow-up-right, fancy indexingarrow-up-right, boolean masking and combinationarrow-up-right - will be applied to Pandas Series and DataFrame objects

1. DATA INDEXING & SELECTION ON SERIES

Series object acts in many ways like a one-dimensional NumPy array, and in many ways like a standard Python dictionary , we will see how.

1.1 Series as Dictionary

Series essentially maps a collection of keys to collection of values

import numpy as np
import pandas as pd 

# making Data Series
data_series = pd.Series([1,2,3,4,5],
                       index=['a','b','c','d','e'])
data_series
a    1
b    2
c    3
d    4
e    5
dtype: int64
  • We can use dictionary like Python expressions

  • We can fetch index of Series object using .keys() method

  • We can fetch index,value pair using .items() method

  • Just like Python Dictionary, we can append Panda Series with index and its value

1.2. Series as one-dimensional array

We can perform same operations on Series object as we do on Numpy Arrays β€” indexing, slicing, masking, fancy indexing

  • Indexing by providing explicit index (string, in our case)

  • Slicing with string as index ALERT: Notice that when you are slicing with an explicit index (i.e., data[:'d']), the stop index is included in the slice

  • Indexing by providing implicit (integer) index

  • Slicing by providing implicit (integer) index. ALERT , note that stop index isn’t included in the output

1.3. Masking & Fancy Indexing

  • In masking, we provide the boolean array under [] to get subset of Series This boolean array can be the result of some conditional operator. For masking, we can pass single condition or group of conditions. We will examine all this concepts in the examples below:

  • Fancy Indexing is where we need to fetch values at arbitrary index points, as compared to simple slicing where we fetch values in some order ([1:10], [::2], for example)

1.4. Indexers: loc, iloc

PROBLEM:

  • We have seen above in the example of slicing that how explicit indexing makes things confusing, this is specially true if the indices are in integer.

  • For example, if your Series has an explicit integer index, an indexing operation such as data[1] will use the explicit indexing, that is fetch the value of index labeled 1 and not the second item as in the implicit indexing. However, slicing operation like data[1:3] will use the implicit Python-style slicing, that is, fetching 2nd and 3rd items in the Series object

SOLUTION:

  • Because of this potential confusion in the case of integer indexes, Pandas provides some special indexer attributes that explicitly expose certain indexing schemes:

a. Using loc

.loc() always reference the explicit index scheme

b. Using iloc

.iloc() always reference the implicit index scheme

2. DATA INDEXING & SELECTION IN A DATAFRAME

DataFrame object acts in many ways like a two-dimensional NumPy array, and in many ways like a dictionary of related Series objects, we will see how:

2.1. DataFrame as a Dictionary

DataFrame as a dictionary of related Series objects

population
area

California

38332521

423967

Texas

26448193

695662

New York

19651127

141297

Florida

19552860

170312

Illinois

12882135

149995

  • Individual column data can be accesses via dictionary style indexing

  • We can also access the column values through the column name as attribute

  • Dictionary-style syntax can be used to modify the object or add new column to DataFrame object

population
area
density

California

38332521

423967

90.413926

Texas

26448193

695662

38.018740

New York

19651127

141297

139.076746

Florida

19552860

170312

114.806121

Illinois

12882135

149995

85.883763

2.2. DataFrame as two-dimensional Array

  • .values method provides underlying values of DataFrame object

  • .T method transposes (columns to rows, rows to columns) the DataFrame object

California
Texas
New York
Florida
Illinois

population

38332521

26448193

19651127

19552860

12882135

area

423967

695662

141297

170312

149995

a. Accessing row

b. Accessing column

πŸ’‘ Remember that [] indexing applies to column labels in DataFrame object as opposed to row labels in Series object

2.3. Using Indexers: loc, iloc

a. Using loc

.loc() always reference the explicit index scheme

population
area

California

38332521

423967

Texas

26448193

695662

New York

19651127

141297

population
area

California

38332521

423967

Texas

26448193

695662

New York

19651127

141297

b. Using iloc

.iloc() always reference the implicit index scheme

population
area

California

38332521

423967

Texas

26448193

695662

New York

19651127

141297

population

California

38332521

Texas

26448193

New York

19651127

Last updated

Was this helpful?