Indexing Pandas Series And Dataframe
Techniques learned in Numpy like indexing, slicing, fancy indexing, boolean masking and combination - will be applied to Pandas Series
and DataFrame
objects
1. DATA INDEXING & SELECTION ON SERIES
Series
object acts in many ways like a one-dimensional NumPy array, and in many ways like a standard Python dictionary , we will see how.
1.1 Series as Dictionary
Series
essentially maps a collection of keys
to collection of values
We can use dictionary like Python expressions
We can fetch index of
Series
object using.keys()
method
We can fetch
index,value
pair using.items()
method
Just like Python Dictionary, we can append Panda Series with index and its value
1.2. Series as one-dimensional array
We can perform same operations on Series
object as we do on Numpy Arrays — indexing, slicing, masking, fancy indexing
Indexing by providing explicit index (string, in our case)
Slicing with string as index ALERT: Notice that when you are slicing with an explicit index (i.e.,
data[:'d'])
, the stop index is included in the slice
Indexing by providing implicit (integer) index
Slicing by providing implicit (integer) index. ALERT , note that stop index isn’t included in the output
1.3. Masking & Fancy Indexing
In masking, we provide the boolean array under
[]
to get subset ofSeries
This boolean array can be the result of some conditional operator. For masking, we can pass single condition or group of conditions. We will examine all this concepts in the examples below:
Fancy Indexing is where we need to fetch values at arbitrary index points, as compared to simple slicing where we fetch values in some order (
[1:10]
,[::2]
, for example)
1.4. Indexers: loc, iloc
PROBLEM:
We have seen above in the example of slicing that how explicit indexing makes things confusing, this is specially true if the indices are in integer.
For example, if your Series has an explicit integer index, an indexing operation such as
data[1]
will use the explicit indexing, that is fetch the value of index labeled1
and not the second item as in the implicit indexing. However, slicing operation likedata[1:3]
will use the implicit Python-style slicing, that is, fetching 2nd and 3rd items in the Series object
SOLUTION:
Because of this potential confusion in the case of integer indexes, Pandas provides some special indexer attributes that explicitly expose certain indexing schemes:
a. Using loc
.loc()
always reference the explicit index scheme
b. Using iloc
.iloc()
always reference the implicit index scheme
2. DATA INDEXING & SELECTION IN A DATAFRAME
DataFrame
object acts in many ways like a two-dimensional NumPy array, and in many ways like a dictionary of related Series
objects, we will see how:
2.1. DataFrame as a Dictionary
DataFrame
as a dictionary of related Series objects
California
38332521
423967
Texas
26448193
695662
New York
19651127
141297
Florida
19552860
170312
Illinois
12882135
149995
Individual column data can be accesses via dictionary style indexing
We can also access the column values through the column name as attribute
Dictionary-style syntax can be used to modify the object or add new column to
DataFrame
object
California
38332521
423967
90.413926
Texas
26448193
695662
38.018740
New York
19651127
141297
139.076746
Florida
19552860
170312
114.806121
Illinois
12882135
149995
85.883763
2.2. DataFrame as two-dimensional Array
.values
method provides underlying values ofDataFrame
object
.T
method transposes (columns to rows, rows to columns) theDataFrame
object
population
38332521
26448193
19651127
19552860
12882135
area
423967
695662
141297
170312
149995
a. Accessing row
b. Accessing column
💡 Remember that []
indexing applies to column labels in DataFrame
object as opposed to row labels in Series
object
2.3. Using Indexers: loc, iloc
a. Using loc
.loc()
always reference the explicit index scheme
California
38332521
423967
Texas
26448193
695662
New York
19651127
141297
California
38332521
423967
Texas
26448193
695662
New York
19651127
141297
b. Using iloc
.iloc()
always reference the implicit index scheme
California
38332521
423967
Texas
26448193
695662
New York
19651127
141297
California
38332521
Texas
26448193
New York
19651127
Last updated
Was this helpful?