Vectorization is process of doing an operation on multiple items (in an array, for example) in one go.
x = np.array([1,2,3,4,5])
# performing vectorization of operations
x * 10
array([10, 20, 30, 40, 50])
However, it is not straightforward to perform vectorization on “array of strings” and Pandas addresses this need of performing vectorized string operations using various str methods
# Panda series use in this section
names = pd.Series(['Walter White', 'Jesse Pinkman', 'Skyler White', 'Hank Shrader', 'Mike Ehrmantraut', 'Gus Fring'])
names
0 Walter White
1 Jesse Pinkman
2 Skyler White
3 Hank Shrader
4 Mike Ehrmantraut
5 Gus Fring
dtype: objec
Regular expression is a special syntax to find string or set of strings. This topic is very broad and can be very dry. However, we are going to taste plain-vanilla flavor of them here.
The following methods accept regular expressions to examinethe content of each string element, and follow some of the API conventions of Python’s built-in re module
# let apply str.extract() method with regular expression to extract the first names
names.str.extract('([A-Za-z]+)')
0
0 Walter
1 Jesse
2 Skyler
3 Hank
4 Mike
5 Gus
There are some good introductory examples on regular expressions usage in Python here
2.3. Vectorized indexing and slicing
# getting first letter of each element in the array
# using standard indexing method
names.str[0]
0 W
1 J
2 S
3 H
4 M
5 G
dtype: object
# getting first letter of each element in the array
# using str.get() method
names.str.get(0)