Pandas Series And Dataframe Object
In this first part, we will introduce two primary components of Pandas — Series and DataFrame objects.
1. PANDAS SERIES OBJECT
Pandas
Seriesis a one-dimensional array of indexed data
Pandas
Seriesis essentially a columnsValues inside the Numpy array have an implicitly defined integer index, whereas the Pandas
Serieshave an explicitly defined index, which can be integer or any data typePanda
Seriesobject can be created from a list or an array or dictionary Pandaseriesconstructor has following common parameters:
pd.Series(data= ,index=, dtype=)data=keyword argument: The first keyword argument forpd.Series()constructor isdata=, however, we don’t need to explicitly set it, if we provide data as first argumentindex=keyword argument: Default index is integer from 0 to n-1, where n is the number of elements in the series. However, we can specify a custom index using theindex=keyword argument. These integers or other data type is collectively called index of Series and each individual index element is called labeldtype=keyword argument is used to explicitly set the data type ofSeriesobjectAdditional parameters includes,
nameandcopy
1.1. Creating Pandas Series object
# importing pandas and numpy
import pandas as pd
import numpy as np# creating panda series object
pd_series = pd.Series([0.25,0.50,0.75,1.0])
# printing panda series object
pd_series[0,1,2,3] is a sequence of index along with its sequence of values [0.25,0.50,0.75,1.0]
We can use the built-in methods of pandas object to fetch these indices and values
a. .values method
.values methodWe use .values method to get values of Series object
b. .index method
.index methodWe use .index method to get indices of Series object
1.2. Pandas Series as Generalized Numpy Array
We will first create the Pandas Series object by providing data in form of explicit list, index is automatically set to integer from 0 to n-1:
We can also create the Series object by providing a previously defined 1D Numpy array
Numpy Array vs Pandas Series:
Contrary to Numpy array, that has implicit integer index, the index in Pandas object can be any data type (int,float,str or combination of them). Let explicitly set the string based index:
1.3. Pandas Series as Specialized Dictionary
Pandas Series object can also be created from the dictionary. To understand the conceptual parallel, remember this:
A dictionary is a structure that maps arbitrary keys to a set of arbitrary values
A Series is a structure that maps typed keys to a set of typed values
Indexing: We can use index label to fetch the corresponding value
This is equivalent of using the implicit integer index. As New York is at index position of 2 so we can also fetch its value in following manner:
1.4. Other ways to create Pandas Series
In the following examples, we will see how we can use the index= keyword argument to construct the Series object from the subset of data provided
→ Using a scalar, with explicit index, that defines the number of scalar instances in a Series object. Look at the example below:
→ Using dictionary, but its subset, by providing index of required values
2. PANDAS DATAFRAME OBJECT
If a
Seriesis analogous to one-dimensional array with flexible indices, aDataFrameis analogous to a two-dimensional array with both flexible row indices and flexible column names Just as you might think of a 2D array as an ordered sequence of aligned (sharing same index) 1D columns, you can think of aDataFrameas a sequence of aligned (sharing same index)Seriesobjects
Panda DataFrame constructor has following common parameters:
DataFrameconstructor has essentially the same keyword arguments as the PandaSeries.However,
DataFramecan’t be constructed from a scalar(single value)Besides, it also takes an additional
columns=keyword argument, which represents the label for the column. The default value of columns is (0,1,2…n)
2.1 DataFrame from List or 2D Array
a. From a list
It seems similar to the Series object we created earlier, but we can set the column label in DataFrame object, by using the column= kwarg. In absence of this kwarg, the default value of first column is set to 0 as can be seen in the example below:
0
1
1
2
2
3
3
4
We can also explicitly set the label for column, as you can see in the example below:
0
1
1
2
2
3
3
4
b. From a 2D Array
We can use 2D array to construct a DataFrame with more than one-columns. If we don’t provide the kwarg columns= the default is set to (0,1,2…n) See the example below:
0
1
2
1
3
4
However, we can also set the custom(explicit) index and column names
row1
1
2
row2
3
4
2.2. DataFrame from Series Object
We can also create DataFrame object from previously defined Series object
0
100
1
200
2
300
3
400
Let explicitly set the index and column labels
a
100
b
200
c
300
d
400
2.3. DataFrame from Dictionaries
In dictionary key-value pair, the value can be another dictionary. We will use this concept to construct our DataFrame object. Pay particular attention as how the key-values are used to assign the index and columns values of the DataFrame
keysprovided underpd.DataFrame()are used ascolumnlabelskeysprovided under assigned dictionaries, are used asindexlabels
California
38332521
423967
Texas
26448193
695662
New York
19651127
141297
Florida
19552860
170312
Illinois
12882135
149995
2.4. Creating DataFrame from ‘list of dictionaries’
In the following example, dictionaries are nested inside the list and we provide data= inside DataFrame in the form of this list. Pay special attention that how the keys of dictionaries are used as column labels
0
0
0
0
1
1
2
3
2
2
4
6
3
3
6
9
4
4
8
12
2.5. Other Concepts
a. Fetching Attributes of DataFrame:
We will fetch the commonly used attributes of a DataFrame:
b. Indexing 101 on DataFrame
DataFrameIndexing (using []) a DataFrame object applies on the columns labels
3. PANDAS INDEX OBJECT
Both Pandas
SeriesandDataFrameobject contains an explicitindexthat lets us reference and modify its data. In some of the above examples, we explicitly provided theindex=keyword argument underpd.Seriesandpd.DataFrameHowever, the index object can be predefined usingpd.Index()constructorThis
Indexobject can be considered either as an immutable array or as an ordered set
3.1. Index as Immutable Array
Index object works in many ways like an array, for example, we can use standard indexing techniques:
However, Index object is immutable array i.e, values cant be changed. If we try to change, it results in TypeError: Index does not support mutable operations
3.2. Index as Ordered Set
The Index object follows many of the conventions used by Python’s built-in Set data structure, so that unions, intersections, differences, and other combinations can be computed in a familiar way:
Last updated
Was this helpful?