Filtering And Sorting In Pandas

This article covers a very basic overview of filtering and sorting techniques in Pandas

import numpy as np
import pandas as pd 

1. FILTERING

1.1. Conditional Operators

In Pandas (just like we covered in Numpy), we can create ‘filter conditions’ for DataFrame. For example, we can use conditional operators on a DataFrame column features, which return the boolean Series representing DataFrame that passes the filter condition

# reading csv and saving its content as 'df'
df = pd.read_csv('data/pedata.csv')

df.head()
Industry NameNumber of firmsCurrent PETrailing PEForward PE

0

Advertising

47

20.01

23.77

13.84

1

Aerospace/Defense

77

35.11

44.26

22.91

2

Air Transport

18

14.87

10.55

10.16

3

Apparel

51

25.76

54.57

21.97

4

Auto & Truck

13

14.77

16.76

20.52

Let’s apply a filter to column Current PE values:

df['Current PE'] > 50

Result, will be a boolean array with True returned for all cells that passes the conditional operator and False otherwise:

0     False
1     False
2     False
3     False
4     False
      ...  
91    False
92    False
93    False
94     True
95     True
Name: Current PE, Length: 96, dtype: bool

Applying filter to column, Number of firms

df['Number of firms'] > 50
0     False
1      True
2     False
3      True
4     False
      ...  
91    False
92    False
93    False
94     True
95     True
Name: Number of firms, Length: 96, dtype: bool

1.2. Filter using Functions

For columns with string values: we can use str.startswith, str.endswith and str.contains functions Using ~ before the operation, negates the filter conditions

df['Industry  Name'].str.startswith('A')
0      True
1      True
2      True
3      True
4      True
      ...  
91    False
92    False
93    False
94    False
95    False
Name: Industry  Name, Length: 96, dtype: bool
df['Industry  Name'].str.contains('Air')
0     False
1     False
2      True
3     False
4     False
      ...  
91    False
92    False
93    False
94    False
95    False
Name: Industry  Name, Length: 96, dtype: bool
# .isin function checks the value in the provided list

df['Current PE'].isin([10,20])
0     False
1     False
2     False
3     False
4     False
      ...  
91    False
92    False
93    False
94    False
95    False
Name: Current PE, Length: 96, dtype: bool

1.3. Masking (Retrieving Rows that passes the Filter condition)

  • In above two sections, we studied how to create the filter that returns a list of boolean (True or False) along rows that passes or fails the test

  • We can apply the same filter under the square brackets [] of DataFrame variable to actually retrieve the rows, passing the filter condition.

df[df['Current PE'] > 150]
Industry NameNumber of firmsCurrent PETrailing PEForward PE

14

Cable TV

14

156.56

80.57

26.37

30

Entertainment

107

150.39

47.68

40.25

38

Healthcare Products

242

159.85

84.43

71.38

43

Hotel/Gaming

65

170.91

134.20

22.75

45

Information Services

69

283.37

46.23

28.00

46

Insurance (General)

19

693.05

67.57

24.42

55

Oil/Gas Distribution

24

605.72

69.41

16.78

67

Reinsurance

2

153.21

57.40

15.90

70

Retail (Building Supply)

17

201.74

238.80

18.33

74

Retail (Online)

70

319.22

243.82

86.28

df[df['Industry  Name'].str.startswith('A')]
Industry NameNumber of firmsCurrent PETrailing PEForward PE

0

Advertising

47

20.01

23.77

13.84

1

Aerospace/Defense

77

35.11

44.26

22.91

2

Air Transport

18

14.87

10.55

10.16

3

Apparel

51

25.76

54.57

21.97

4

Auto & Truck

13

14.77

16.76

20.52

5

Auto Parts

46

16.42

17.58

15.63

Using multiple conditions. Let suppose we need to know the industries with Current PE greater than 100 and Forward PE greater than 120

df[(df['Current PE'] > 100) & (df['Forward PE'] > 120)] 
Industry NameNumber of firmsCurrent PETrailing PEForward PE

77

Semiconductor

72

109.36

97.09

248.11

87

Telecom. Services

67

115.24

742.09

121.34

As another example, let suppose we are interested to know which industries has Forward PE greater than its Current PE value

df[df['Forward PE'] > df['Current PE']] 
Industry NameNumber of firmsCurrent PETrailing PEForward PE

4

Auto & Truck

13

14.77

16.76

20.52

6

Bank (Money Center)

7

10.56

10.23

12.17

9

Beverage (Soft)

34

34.49

39.87

143.56

11

Brokerage & Investment Banking

39

14.08

18.05

16.34

15

Chemical (Basic)

43

14.40

16.11

22.42

16

Chemical (Diversified)

6

9.63

10.48

10.13

20

Computers/Peripherals

48

24.13

28.92

30.93

21

Construction Supplies

44

22.33

39.58

26.20

24

Drugs (Pharmaceutical)

267

22.48

58.18

35.43

25

Education

35

21.30

22.20

26.03

27

Electronics (Consumer & Office)

20

18.40

64.24

18.80

47

Insurance (Life)

24

15.27

21.05

66.72

53

Oil/Gas (Integrated)

4

12.73

22.67

31.99

54

Oil/Gas (Production and Exploration)

269

19.20

8.66

34.96

65

Real Estate (Operations & Services)

57

23.20

32.46

33.99

77

Semiconductor

72

109.36

97.09

248.11

78

Semiconductor Equip

39

25.57

39.73

28.46

81

Software (Entertainment)

86

60.25

33.98

82.81

82

Software (Internet)

30

90.14

66.75

100.71

84

Steel

32

10.61

14.34

24.60

85

Telecom (Wireless)

18

27.21

25.66

29.17

87

Telecom. Services

67

115.24

742.09

121.34

91

Trucking

33

17.56

18.36

23.54

2. SORTING

2.1. Sort by Feature

  • We can use sort_values function to sort DataFrame by one or more of its columns

  • we can either provide a single column label or list of column labels to sort by

  • Keyword arguments, ascending=True tells to sort in ascending order, ascending=False will sort in descending order

df.sort_values('Current PE')
Industry NameNumber of firmsCurrent PETrailing PEForward PE

18

Coal & Related Energy

22

7.06

10.30

7.04

16

Chemical (Diversified)

6

9.63

10.48

10.13

6

Bank (Money Center)

7

10.56

10.23

12.17

84

Steel

32

10.61

14.34

24.60

53

Oil/Gas (Integrated)

4

12.73

22.67

31.99

...

...

...

...

...

...

70

Retail (Building Supply)

17

201.74

238.80

18.33

45

Information Services

69

283.37

46.23

28.00

74

Retail (Online)

70

319.22

243.82

86.28

55

Oil/Gas Distribution

24

605.72

69.41

16.78

46

Insurance (General)

19

693.05

67.57

24.42

96 rows × 5 columns

df.sort_values('Number of firms', ascending=False)
Industry NameNumber of firmsCurrent PETrailing PEForward PE

94

Total Market

7053

60.52

70.85

35.79

95

Total Market (without financials)

5878

62.49

76.83

39.72

7

Banks (Regional)

611

16.99

15.41

13.70

23

Drugs (Biotechnology)

503

77.30

77.56

30.21

83

Software (System & Application)

363

144.40

110.90

76.82

...

...

...

...

...

...

6

Bank (Money Center)

7

10.56

10.23

12.17

16

Chemical (Diversified)

6

9.63

10.48

10.13

53

Oil/Gas (Integrated)

4

12.73

22.67

31.99

76

Rubber& Tires

4

15.27

21.55

8.95

67

Reinsurance

2

153.21

57.40

15.90

96 rows × 5 columns

# when using a list of column labels, any label used after first 
# acts as a tiebreaker for its preceding label
df.sort_values(['Current PE', 'Number of firms'])
Industry NameNumber of firmsCurrent PETrailing PEForward PE

18

Coal & Related Energy

22

7.06

10.30

7.04

16

Chemical (Diversified)

6

9.63

10.48

10.13

6

Bank (Money Center)

7

10.56

10.23

12.17

84

Steel

32

10.61

14.34

24.60

53

Oil/Gas (Integrated)

4

12.73

22.67

31.99

...

...

...

...

...

...

70

Retail (Building Supply)

17

201.74

238.80

18.33

45

Information Services

69

283.37

46.23

28.00

74

Retail (Online)

70

319.22

243.82

86.28

55

Oil/Gas Distribution

24

605.72

69.41

16.78

46

Insurance (General)

19

693.05

67.57

24.42

96 rows × 5 columns

3. MORE EXAMPLES

In this section, we will load IMDB dataset. It is not a complete dataset of all IMDB, but a subset of 1,000 popular movies on IMDB from 2006 to 2016

imdb = pd.read_csv('data/imdb.csv')

imdb.head()
RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore

0

1

Guardians of the Galaxy

Action,Adventure,Sci-Fi

A group of intergalactic criminals are forced ...

James Gunn

Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...

2014

121

8.1

757074

333.13

76.0

1

2

Prometheus

Adventure,Mystery,Sci-Fi

Following clues to the origin of mankind, a te...

Ridley Scott

Noomi Rapace, Logan Marshall-Green, Michael Fa...

2012

124

7.0

485820

126.46

65.0

2

3

Split

Horror,Thriller

Three girls are kidnapped by a man with a diag...

M. Night Shyamalan

James McAvoy, Anya Taylor-Joy, Haley Lu Richar...

2016

117

7.3

157606

138.12

62.0

3

4

Sing

Animation,Comedy,Family

In a city of humanoid animals, a hustling thea...

Christophe Lourdelet

Matthew McConaughey,Reese Witherspoon, Seth Ma...

2016

108

7.2

60545

270.32

59.0

4

5

Suicide Squad

Action,Adventure,Fantasy

A secret government agency recruits some of th...

David Ayer

Will Smith, Jared Leto, Margot Robbie, Viola D...

2016

123

6.2

393727

325.02

40.0

print(imdb.shape)
(1000, 12)

🤔 Show us all movies from 2016 with rating greater than 8.5

imdb[(imdb['Year'] == 2016) & (imdb['Rating'] >= 8.5)].head()
RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore

96

97

Kimi no na wa

Animation,Drama,Fantasy

Two strangers find themselves linked in a biza...

Makoto Shinkai

Ryûnosuke Kamiki, Mone Kamishiraishi, Ryô Nari...

2016

106

8.6

34110

4.68

79.0

117

118

Dangal

Action,Biography,Drama

Former wrestler Mahavir Singh Phogat and his t...

Nitesh Tiwari

Aamir Khan, Sakshi Tanwar, Fatima Sana Shaikh,...

2016

161

8.8

48969

11.15

NaN

🤔 Which movies generated revenues greater than 500 million?

imdb[imdb['Revenue (Millions)'] >= 500].head()
RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore

12

13

Rogue One

Action,Adventure,Sci-Fi

The Rebel Alliance makes a risky move to steal...

Gareth Edwards

Felicity Jones, Diego Luna, Alan Tudyk, Donnie...

2016

133

7.9

323118

532.17

65.0

50

51

Star Wars: Episode VII - The Force Awakens

Action,Adventure,Fantasy

Three decades after the defeat of the Galactic...

J.J. Abrams

Daisy Ridley, John Boyega, Oscar Isaac, Domhna...

2015

136

8.1

661608

936.63

81.0

54

55

The Dark Knight

Action,Crime,Drama

When the menace known as the Joker wreaks havo...

Christopher Nolan

Christian Bale, Heath Ledger, Aaron Eckhart,Mi...

2008

152

9.0

1791916

533.32

82.0

76

77

The Avengers

Action,Sci-Fi

Earth's mightiest heroes must come together an...

Joss Whedon

Robert Downey Jr., Chris Evans, Scarlett Johan...

2012

143

8.1

1045588

623.28

69.0

85

86

Jurassic World

Action,Adventure,Sci-Fi

A new theme park, built on the original site o...

Colin Trevorrow

Chris Pratt, Bryce Dallas Howard, Ty Simpkins,...

2015

124

7.0

455169

652.18

59.0

🤔 Sorting the DataFrame by rating

imdb.sort_values('Rating').head()
RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore

829

830

Disaster Movie

Comedy

Over the course of one evening, an unsuspectin...

Jason Friedberg

Carmen Electra, Vanessa Lachey,Nicole Parker, ...

2008

87

1.9

77207

14.17

15.0

42

43

Don't Fuck in the Woods

Horror

A group of friends are going on a camping trip...

Shawn Burkett

Brittany Blanton, Ayse Howard, Roman Jossart,N...

2016

73

2.7

496

NaN

NaN

871

872

Dragonball Evolution

Action,Adventure,Fantasy

The young warrior Son Goku sets out on a quest...

James Wong

Justin Chatwin, James Marsters, Yun-Fat Chow, ...

2009

85

2.7

59512

9.35

45.0

647

648

Tall Men

Fantasy,Horror,Thriller

A challenged man is stalked by tall phantoms i...

Jonathan Holbrook

Dan Crisafulli, Kay Whitney, Richard Garcia, P...

2016

133

3.2

173

NaN

57.0

968

969

Wrecker

Action,Horror,Thriller

Best friends Emily and Lesley go on a road tri...

Micheal Bafaro

Anna Hutchison, Andrea Whitburn, Jennifer Koen...

2015

83

3.5

1210

NaN

37.0

🤔 Apply Filtering and Sorting at once: Which movies score greater than one million votes on IMDB, sort the result by rating

imdb[imdb['Votes'] > 1000000].sort_values('Rating', ascending=False).head()
RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore

54

55

The Dark Knight

Action,Crime,Drama

When the menace known as the Joker wreaks havo...

Christopher Nolan

Christian Bale, Heath Ledger, Aaron Eckhart,Mi...

2008

152

9.0

1791916

533.32

82.0

80

81

Inception

Action,Adventure,Sci-Fi

A thief, who steals corporate secrets through ...

Christopher Nolan

Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen...

2010

148

8.8

1583625

292.57

74.0

36

37

Interstellar

Adventure,Drama,Sci-Fi

A team of explorers travel through a wormhole ...

Christopher Nolan

Matthew McConaughey, Anne Hathaway, Jessica Ch...

2014

169

8.6

1047747

187.99

74.0

124

125

The Dark Knight Rises

Action,Thriller

Eight years after the Joker's reign of anarchy...

Christopher Nolan

Christian Bale, Tom Hardy, Anne Hathaway,Gary ...

2012

164

8.5

1222645

448.13

78.0

144

145

Django Unchained

Drama,Western

With the help of a German bounty hunter , a fr...

Quentin Tarantino

Jamie Foxx, Christoph Waltz, Leonardo DiCaprio...

2012

165

8.4

1039115

162.80

81.0

🤔 Which movies did business greater than 100 million but have IMDB rating of 6 or less. Then sort results by rating:

imdb[(imdb['Revenue (Millions)'] > 100) & (imdb['Rating'] <= 6.0)].sort_values('Rating', ascending=True).head()
RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore

63

64

Fifty Shades of Grey

Drama,Romance,Thriller

Literature student Anastasia Steele's life cha...

Sam Taylor-Johnson

Dakota Johnson, Jamie Dornan, Jennifer Ehle,El...

2015

125

4.1

244474

166.15

46.0

580

581

Kickboxer: Vengeance

Action

A kick boxer is out to avenge his brother.

John Stockwell

Dave Bautista, Alain Moussi, Gina Carano, Jean...

2016

90

4.9

6809

131.56

37.0

925

926

The Twilight Saga: Breaking Dawn - Part 1

Adventure,Drama,Fantasy

The Quileutes close in on expecting parents Ed...

Bill Condon

Kristen Stewart, Robert Pattinson, Taylor Laut...

2011

117

4.9

190244

281.28

45.0

941

942

The Twilight Saga: Eclipse

Adventure,Drama,Fantasy

As a string of mysterious killings grips Seatt...

David Slade

Kristen Stewart, Robert Pattinson, Taylor Laut...

2010

124

4.9

192740

300.52

58.0

165

166

Twilight

Drama,Fantasy,Romance

A teenage girl risks everything when she falls...

Catherine Hardwicke

Kristen Stewart, Robert Pattinson, Billy Burke...

2008

122

5.2

361449

191.45

56.0

Fifty shades of grey did make money for producers but people didn’t like the movie that much, do they?

Last updated