Universal Functions In Pandas
1. UNIVERSAL FUNCTIONS: INDEX PRESERVATION
All NumPy Ufunc will work on Pandas Series
and DataFrame
First, letβs create Pandas Series
of random integers
import numpy as np
import pandas as pd
# creating random state
rand = np.random.RandomState(42)
# Creating Pandas Series of random integers
ser1 = pd.Series(rand.randint(10, size=4))
print(ser1)
0 6
1 3
2 7
3 4
dtype: int64
Second, create a Pandas DataFrame
of random integers
# Creating Pandas DataFrame
df1 = pd.DataFrame(rand.randint(10,size=(3,4)),
columns=['a','b','c','d'])
print(df1)
a b c d
0 6 9 2 6
1 7 4 3 7
2 7 2 5 4
Now, if we apply any Numpy Ufunc on these objects (Series
or DataFrame
) the result will be another Panda object with indices preserved
# Taking exponent of all element in the Series, sr1
np.exp(ser1)
0 403.428793
1 20.085537
2 1096.633158
3 54.598150
dtype: float64
# Doing arithmatic on each element of dataframe, df1
print(np.multiply(df1,10))
a b c d
0 60 90 20 60
1 70 40 30 70
2 70 20 50 40
2. UNIVERSAL FUNCTIONS: INDEX ALIGNMENT
2.1. Index Alignment in Series
When we try to add
two Series
with non-identical index, the resulting sum will keep the index alignment
# First, define two series whose index are not identical
A = pd.Series([1,2,3], index=[0,1,2]) #index[0,1,2]
B = pd.Series([10,20,30], index=[1,2,3]) #index[1,2,3]
# Second, perform addition of these two series
print(A); print(B)
print(A.add(B))
0 1
1 2
2 3
dtype: int64
1 10
2 20
3 30
dtype: int64
0 NaN
1 12.0
2 23.0
3 NaN
dtype: float64
As we can tell from above example, when we perform the sum, the indices of both series are preserved.
add() method with fill_value
When Python doesnβt find any corresponding value on same index, it returns
NaN
For example, in Series
A
there is index 0 but no corresponding value for SeriesB
, index 0To handle this NaN, we can use kwarg
fill_value
with Pandas.add()
method
A.add(B, fill_value=0)
0 1.0
1 12.0
2 23.0
3 30.0
dtype: float64
2.2. Index Alignment in DataFrame
When we try to add
two DataFrame
with non-identical index, the resulting sum will keep the index alignment
# First, defining two dataframes with not identical indices or columns
C = pd.DataFrame(rand.randint(10, size=(2,2)),
columns=['a','b'])
D = pd.DataFrame(rand.randint(10, size=(3,3)),
columns=['a','b','c'])
print(C); print(D)
a b
0 1 7
1 5 1
a b c
0 4 0 9
1 5 8 0
2 9 2 6
# Secondly, we add these two dataframes and see how results are handled
print(C.add(D))
a b c
0 5.0 7.0 NaN
1 10.0 9.0 NaN
2 NaN NaN NaN
add() method with fill_value
When Python doesnβt find any corresponding value on same index and column, it returns
NaN
For example, in DataFrame
D
there is index 0, column βcβ but no corresponding value for SeriesC
under index 0, column βcβWe can use keyword argument,
fill_value
with Pandas.add()
method to handle the NaN
print(C.add(D, fill_value=0))
a b c
0 5.0 7.0 9.0
1 10.0 9.0 0.0
2 9.0 2.0 6.0
2.3. Python Operators and their equivalent Pandas Methods
+
add()
-
sub(),subtract()
*
mul(),multiply()
/
div(),divide(),truediv()
//
floordiv()
%
mod()
**
pow()
3. UNIVERSAL FUNCTIONS: OTHER OPERATIONS
3.1. Understanding βaxisβ keyword argument
One way to look at axis
kwarg:
axis
kwarg:Remember that we mention, axis=0
or axis=index
the operation will be performed column wise and when we mention axis=1
or axis=column
, the operation will be performed row wise.
Another way to look at axis
kwarg:
axis
kwarg:axis=0
oraxis=index
means to perform operation on all the rows in each columnaxis=1
oraxis=column
means to perform operation on all the columns in each row
3.2. Operations on Self
Letβs subtract values of first row of the df1
from all rows in df1
. In this case, the default value of kwarg, axis
is 1
or columns
print(df1)
print(df1.subtract(df1.iloc[0]))
a b c d
0 6 9 2 6
1 7 4 3 7
2 7 2 5 4
a b c d
0 0 0 0 0
1 1 -5 1 1
2 1 -7 3 -2
However, If we would like to apply this arithmetic operation index-wise, we can use, axis=0
or axis=index
print(df1.subtract(df1['a'], axis=0))
a b c d
0 0 3 -4 0
1 0 -3 -4 0
2 0 -5 -2 -3
3.3. Operation between Series and DataFrame
Operations between a DataFrame
and Series
object are similar to operations between a two-dimensional and one-dimensional NumPy array
# Series
ser11 = pd.Series(rand.randint(12, size=3))
ser11
0 2
1 9
2 11
dtype: int64
# DataFrame
df11 = pd.DataFrame(rand.randint(10,size=(3,4)),
columns=['a','b','c','d'] )
print(df11)
a b c d
0 7 5 7 8
1 3 0 0 9
2 3 6 1 2
Let add Series
to DataFrame
with kwarg, axis=0
or axis=index
, which matches the index . Both ser1
and df1
have identical index
print(df1.add(ser1, axis=0))
a b c d
0 9 7 9 10
1 12 9 9 18
2 14 17 12 13
Last updated
Was this helpful?