Pandas Series And Dataframe Object
In this first part, we will introduce two primary components of Pandas — Series
and DataFrame
objects.
1. PANDAS SERIES OBJECT
Pandas
Series
is a one-dimensional array of indexed data
Pandas
Series
is essentially a columnsValues inside the Numpy array have an implicitly defined integer index, whereas the Pandas
Series
have an explicitly defined index, which can be integer or any data typePanda
Series
object can be created from a list or an array or dictionary Pandaseries
constructor has following common parameters:
data=
keyword argument: The first keyword argument forpd.Series()
constructor isdata=
, however, we don’t need to explicitly set it, if we provide data as first argumentindex=
keyword argument: Default index is integer from 0 to n-1, where n is the number of elements in the series. However, we can specify a custom index using theindex=
keyword argument. These integers or other data type is collectively called index of Series and each individual index element is called labeldtype=
keyword argument is used to explicitly set the data type ofSeries
objectAdditional parameters includes,
name
andcopy
1.1. Creating Pandas Series object
[0,1,2,3]
is a sequence of index along with its sequence of values [0.25,0.50,0.75,1.0]
We can use the built-in methods of pandas object to fetch these indices and values
a. .values
method
.values
methodWe use .values
method to get values of Series
object
b. .index
method
.index
methodWe use .index
method to get indices of Series
object
1.2. Pandas Series as Generalized Numpy Array
We will first create the Pandas Series
object by providing data in form of explicit list, index is automatically set to integer from 0 to n-1:
We can also create the Series
object by providing a previously defined 1D Numpy array
Numpy Array vs Pandas Series:
Contrary to Numpy array, that has implicit integer index, the index in Pandas object can be any data type (int
,float
,str
or combination of them). Let explicitly set the string based index:
1.3. Pandas Series as Specialized Dictionary
Pandas Series
object can also be created from the dictionary. To understand the conceptual parallel, remember this:
A dictionary is a structure that maps arbitrary keys to a set of arbitrary values
A Series is a structure that maps typed keys to a set of typed values
Indexing: We can use index label to fetch the corresponding value
This is equivalent of using the implicit integer index. As New York
is at index position of 2
so we can also fetch its value in following manner:
1.4. Other ways to create Pandas Series
In the following examples, we will see how we can use the index=
keyword argument to construct the Series
object from the subset of data
provided
→ Using a scalar, with explicit index, that defines the number of scalar instances in a Series
object. Look at the example below:
→ Using dictionary, but its subset, by providing index of required values
2. PANDAS DATAFRAME OBJECT
If a
Series
is analogous to one-dimensional array with flexible indices, aDataFrame
is analogous to a two-dimensional array with both flexible row indices and flexible column names Just as you might think of a 2D array as an ordered sequence of aligned (sharing same index) 1D columns, you can think of aDataFrame
as a sequence of aligned (sharing same index)Series
objects
Panda DataFrame
constructor has following common parameters:
DataFrame
constructor has essentially the same keyword arguments as the PandaSeries
.However,
DataFrame
can’t be constructed from a scalar(single value)Besides, it also takes an additional
columns=
keyword argument, which represents the label for the column. The default value of columns is (0,1,2…n)
2.1 DataFrame from List or 2D Array
a. From a list
It seems similar to the Series
object we created earlier, but we can set the column label in DataFrame
object, by using the column=
kwarg. In absence of this kwarg, the default value of first column is set to 0
as can be seen in the example below:
0
1
1
2
2
3
3
4
We can also explicitly set the label for column, as you can see in the example below:
0
1
1
2
2
3
3
4
b. From a 2D Array
We can use 2D array to construct a DataFrame with more than one-columns. If we don’t provide the kwarg columns=
the default is set to (0,1,2…n) See the example below:
0
1
2
1
3
4
However, we can also set the custom(explicit) index
and column
names
row1
1
2
row2
3
4
2.2. DataFrame from Series Object
We can also create DataFrame
object from previously defined Series
object
0
100
1
200
2
300
3
400
Let explicitly set the index
and column
labels
a
100
b
200
c
300
d
400
2.3. DataFrame from Dictionaries
In dictionary key-value pair, the value can be another dictionary. We will use this concept to construct our DataFrame object. Pay particular attention as how the key-values are used to assign the index
and columns
values of the DataFrame
keys
provided underpd.DataFrame()
are used ascolumn
labelskeys
provided under assigned dictionaries, are used asindex
labels
California
38332521
423967
Texas
26448193
695662
New York
19651127
141297
Florida
19552860
170312
Illinois
12882135
149995
2.4. Creating DataFrame from ‘list of dictionaries’
In the following example, dictionaries
are nested inside the list
and we provide data=
inside DataFrame in the form of this list
. Pay special attention that how the keys
of dictionaries are used as column
labels
0
0
0
0
1
1
2
3
2
2
4
6
3
3
6
9
4
4
8
12
2.5. Other Concepts
a. Fetching Attributes of DataFrame:
We will fetch the commonly used attributes of a DataFrame:
b. Indexing 101 on DataFrame
DataFrame
Indexing (using []
) a DataFrame object applies on the columns
labels
3. PANDAS INDEX OBJECT
Both Pandas
Series
andDataFrame
object contains an explicitindex
that lets us reference and modify its data. In some of the above examples, we explicitly provided theindex=
keyword argument underpd.Series
andpd.DataFrame
However, the index object can be predefined usingpd.Index()
constructorThis
Index
object can be considered either as an immutable array or as an ordered set
3.1. Index as Immutable Array
Index
object works in many ways like an array, for example, we can use standard indexing techniques:
However, Index
object is immutable array i.e, values cant be changed. If we try to change, it results in TypeError:
Index does not support mutable operations
3.2. Index as Ordered Set
The Index
object follows many of the conventions used by Python’s built-in Set
data structure, so that unions, intersections, differences, and other combinations can be computed in a familiar way:
Last updated
Was this helpful?