Universal Functions In Pandas

1. UNIVERSAL FUNCTIONS: INDEX PRESERVATION

All NumPy Ufunc will work on Pandas Series and DataFrame

First, letโ€™s create Pandas Series of random integers

import numpy as np
import pandas as pd 

# creating random state
rand = np.random.RandomState(42)

# Creating Pandas Series of random integers
ser1 = pd.Series(rand.randint(10, size=4))
print(ser1)
0    6
1    3
2    7
3    4
dtype: int64

Second, create a Pandas DataFrame of random integers

# Creating Pandas DataFrame
df1 = pd.DataFrame(rand.randint(10,size=(3,4)),
                   columns=['a','b','c','d'])
                   
print(df1)
   a  b  c  d
0  6  9  2  6
1  7  4  3  7
2  7  2  5  4

Now, if we apply any Numpy Ufunc on these objects (Series or DataFrame) the result will be another Panda object with indices preserved

# Taking exponent of all element in the Series, sr1
np.exp(ser1)
0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64
# Doing arithmatic on each element of dataframe, df1
print(np.multiply(df1,10))
    a   b   c   d
0  60  90  20  60
1  70  40  30  70
2  70  20  50  40

2. UNIVERSAL FUNCTIONS: INDEX ALIGNMENT

2.1. Index Alignment in Series

When we try to add two Series with non-identical index, the resulting sum will keep the index alignment

# First, define two series whose index are not identical
A = pd.Series([1,2,3], index=[0,1,2]) #index[0,1,2]
B = pd.Series([10,20,30], index=[1,2,3]) #index[1,2,3]

# Second, perform addition of these two series
print(A); print(B)
print(A.add(B))
0    1
1    2
2    3
dtype: int64
1    10
2    20
3    30
dtype: int64
0     NaN
1    12.0
2    23.0
3     NaN
dtype: float64

As we can tell from above example, when we perform the sum, the indices of both series are preserved.

add() method with fill_value

  • When Python doesnโ€™t find any corresponding value on same index, it returns NaN

  • For example, in Series A there is index 0 but no corresponding value for Series B, index 0

  • To handle this NaN, we can use kwarg fill_value with Pandas .add() method

A.add(B, fill_value=0)
0     1.0
1    12.0
2    23.0
3    30.0
dtype: float64

2.2. Index Alignment in DataFrame

When we try to add two DataFrame with non-identical index, the resulting sum will keep the index alignment

# First, defining two dataframes with not identical indices or columns
C = pd.DataFrame(rand.randint(10, size=(2,2)),
                columns=['a','b'])

D = pd.DataFrame(rand.randint(10, size=(3,3)),
                columns=['a','b','c'])

print(C); print(D)
   a  b
0  1  7
1  5  1
   a  b  c
0  4  0  9
1  5  8  0
2  9  2  6
# Secondly, we add these two dataframes and see how results are handled
print(C.add(D))
      a    b   c
0   5.0  7.0 NaN
1  10.0  9.0 NaN
2   NaN  NaN NaN

add() method with fill_value

  • When Python doesnโ€™t find any corresponding value on same index and column, it returns NaN

  • For example, in DataFrame D there is index 0, column โ€˜cโ€™ but no corresponding value for Series C under index 0, column โ€˜cโ€™

  • We can use keyword argument, fill_value with Pandas .add() method to handle the NaN

print(C.add(D, fill_value=0))
      a    b    c
0   5.0  7.0  9.0
1  10.0  9.0  0.0
2   9.0  2.0  6.0

2.3. Python Operators and their equivalent Pandas Methods

Python operator
Parameter method(s)

+

add()

-

sub(),subtract()

*

mul(),multiply()

/

div(),divide(),truediv()

//

floordiv()

%

mod()

**

pow()

3. UNIVERSAL FUNCTIONS: OTHER OPERATIONS

3.1. Understanding โ€˜axisโ€™ keyword argument

One way to look at axis kwarg:

Remember that we mention, axis=0 or axis=index the operation will be performed column wise and when we mention axis=1 or axis=column, the operation will be performed row wise.

Another way to look at axis kwarg:

  • axis=0 or axis=index means to perform operation on all the rows in each column

  • axis=1 or axis=column means to perform operation on all the columns in each row

3.2. Operations on Self

Letโ€™s subtract values of first row of the df1 from all rows in df1. In this case, the default value of kwarg, axis is 1 or columns

print(df1)
print(df1.subtract(df1.iloc[0]))
   a  b  c  d
0  6  9  2  6
1  7  4  3  7
2  7  2  5  4
   a  b  c  d
0  0  0  0  0
1  1 -5  1  1
2  1 -7  3 -2

However, If we would like to apply this arithmetic operation index-wise, we can use, axis=0 or axis=index

print(df1.subtract(df1['a'], axis=0))
   a  b  c  d
0  0  3 -4  0
1  0 -3 -4  0
2  0 -5 -2 -3

3.3. Operation between Series and DataFrame

Operations between a DataFrame and Series object are similar to operations between a two-dimensional and one-dimensional NumPy array

# Series
ser11 = pd.Series(rand.randint(12, size=3))
ser11
0     2
1     9
2    11
dtype: int64
# DataFrame
df11 = pd.DataFrame(rand.randint(10,size=(3,4)),
                  columns=['a','b','c','d'] )
print(df11)
   a  b  c  d
0  7  5  7  8
1  3  0  0  9
2  3  6  1  2

Let add Series to DataFrame with kwarg, axis=0 or axis=index, which matches the index . Both ser1 and df1 have identical index

print(df1.add(ser1, axis=0))
    a   b   c   d
0   9   7   9  10
1  12   9   9  18
2  14  17  12  13

Last updated

Was this helpful?