# Universal Functions In Pandas

## 1. UNIVERSAL FUNCTIONS: INDEX PRESERVATION

All [NumPy Ufunc](https://tahamaddam.com/coding/numpy/numpys-universal-functions/) will work on Pandas `Series` and `DataFrame`

First, let’s create Pandas `Series` of random integers

```python
import numpy as np
import pandas as pd 

# creating random state
rand = np.random.RandomState(42)

# Creating Pandas Series of random integers
ser1 = pd.Series(rand.randint(10, size=4))
print(ser1)
```

```
0    6
1    3
2    7
3    4
dtype: int64
```

Second, create a Pandas `DataFrame` of random integers

```python
# Creating Pandas DataFrame
df1 = pd.DataFrame(rand.randint(10,size=(3,4)),
                   columns=['a','b','c','d'])
                   
print(df1)
```

```
   a  b  c  d
0  6  9  2  6
1  7  4  3  7
2  7  2  5  4
```

Now, if we apply any Numpy Ufunc on these objects (`Series` or `DataFrame`) the result will be another Panda object with **indices preserved**

```python
# Taking exponent of all element in the Series, sr1
np.exp(ser1)
```

```
0     403.428793
1      20.085537
2    1096.633158
3      54.598150
dtype: float64
```

```python
# Doing arithmatic on each element of dataframe, df1
print(np.multiply(df1,10))
```

```
    a   b   c   d
0  60  90  20  60
1  70  40  30  70
2  70  20  50  40
```

## 2. UNIVERSAL FUNCTIONS: INDEX ALIGNMENT

### 2.1. Index Alignment in Series

When we try to `add` two `Series` with non-identical index, the resulting sum will keep the index alignment

```python
# First, define two series whose index are not identical
A = pd.Series([1,2,3], index=[0,1,2]) #index[0,1,2]
B = pd.Series([10,20,30], index=[1,2,3]) #index[1,2,3]

# Second, perform addition of these two series
print(A); print(B)
print(A.add(B))
```

```
0    1
1    2
2    3
dtype: int64
1    10
2    20
3    30
dtype: int64
0     NaN
1    12.0
2    23.0
3     NaN
dtype: float64
```

As we can tell from above example, when we perform the sum, the indices of both series are preserved.

#### add() method with fill\_value

* When Python doesn’t find any corresponding value on same index, it returns `NaN`
* For example, in Series `A` there is index 0 but no corresponding value for Series `B`, index 0
* To handle this NaN, we can use kwarg `fill_value` with Pandas `.add()` method

```python
A.add(B, fill_value=0)
```

```
0     1.0
1    12.0
2    23.0
3    30.0
dtype: float64
```

### 2.2. Index Alignment in DataFrame

When we try to `add` two `DataFrame` with non-identical index, the resulting sum will keep the index alignment

```python
# First, defining two dataframes with not identical indices or columns
C = pd.DataFrame(rand.randint(10, size=(2,2)),
                columns=['a','b'])

D = pd.DataFrame(rand.randint(10, size=(3,3)),
                columns=['a','b','c'])

print(C); print(D)
```

```
   a  b
0  1  7
1  5  1
   a  b  c
0  4  0  9
1  5  8  0
2  9  2  6
```

```python
# Secondly, we add these two dataframes and see how results are handled
print(C.add(D))
```

```
      a    b   c
0   5.0  7.0 NaN
1  10.0  9.0 NaN
2   NaN  NaN NaN
```

#### add() method with fill\_value

* When Python doesn’t find any corresponding value on same index and column, it returns `NaN`
* For example, in DataFrame `D` there is index 0, column ‘c’ but no corresponding value for Series `C` under index 0, column ‘c’
* We can use keyword argument, `fill_value` with Pandas `.add()` method to handle the NaN

```python
print(C.add(D, fill_value=0))
```

```
      a    b    c
0   5.0  7.0  9.0
1  10.0  9.0  0.0
2   9.0  2.0  6.0
```

### 2.3. Python Operators and their equivalent Pandas Methods

| Python operator | Parameter method(s)      |
| --------------- | ------------------------ |
| +               | add()                    |
| -               | sub(),subtract()         |
| \*              | mul(),multiply()         |
| /               | div(),divide(),truediv() |
| //              | floordiv()               |
| %               | mod()                    |
| \*\*            | pow()                    |

## 3. UNIVERSAL FUNCTIONS: OTHER OPERATIONS

### 3.1. Understanding ‘axis’ keyword argument

#### One way to look at `axis` kwarg:

Remember that we mention, `axis=0` or `axis=index` the operation will be performed *column wise* and when we mention `axis=1` or `axis=column`, the operation will be performed *row wise*.

#### Another way to look at `axis` kwarg:

* `axis=0` or `axis=index` means to perform operation on *all the rows in each column*
* `axis=1` or `axis=column` means to perform operation on *all the columns in each row*

### 3.2. Operations on Self

Let’s subtract values of first row of the `df1` from all rows in `df1`. In this case, the default value of kwarg, `axis` is `1` or `columns`

```python
print(df1)
print(df1.subtract(df1.iloc[0]))
```

```
   a  b  c  d
0  6  9  2  6
1  7  4  3  7
2  7  2  5  4
   a  b  c  d
0  0  0  0  0
1  1 -5  1  1
2  1 -7  3 -2
```

However, If we would like to apply this arithmetic operation index-wise, we can use, `axis=0` or `axis=index`

```python
print(df1.subtract(df1['a'], axis=0))
```

```
   a  b  c  d
0  0  3 -4  0
1  0 -3 -4  0
2  0 -5 -2 -3
```

### 3.3. Operation between Series and DataFrame

Operations between a `DataFrame` and `Series` object are similar to operations between a two-dimensional and one-dimensional NumPy array

```python
# Series
ser11 = pd.Series(rand.randint(12, size=3))
ser11
```

```
0     2
1     9
2    11
dtype: int64
```

```python
# DataFrame
df11 = pd.DataFrame(rand.randint(10,size=(3,4)),
                  columns=['a','b','c','d'] )
print(df11)
```

```
   a  b  c  d
0  7  5  7  8
1  3  0  0  9
2  3  6  1  2
```

Let add `Series` to `DataFrame` with kwarg, `axis=0` or `axis=index`, which matches the index . Both `ser1` and `df1` have identical index

```python
print(df1.add(ser1, axis=0))
```

```
    a   b   c   d
0   9   7   9  10
1  12   9   9  18
2  14  17  12  13
```
