Connect with us

Hi, what are you looking for?

Latest

How to Filter Pandas Dataframe By Column Values?

One of the main advantages of using data as panda data frames is that pandas allow us to split the data in different ways.

Often you need a subset of these building frames based on one or more values in a certain column. In fact, we want to select rows based on one or more values in the column.

Here are SIX examples of using a panda data frame to filter rows or to select column values based on rows.

Let’s start loading Gapminder’s data into the pandas in the form of dataframes.

This database has more than 6000 rows and 6 columns. One of the speakers is a year old. Let’s look at the first three lines of the data frame.

1

2

3

4

5

gapminder.head(3))

Year Country Pop Continental LifeEhr gpPerkap

0 Afghanistan 1952 8425333,0 Asia 28 801 779 445314

1 Afghanistan, 1957. 9240934,0 Asia 30 332 820 853030

2 Afghanistan 1962 10267083,0 Asia 31,997 853 100710

Suppose we want to filter the data frame to get a smaller data frame with values for the year 2002. In other words, we want a subset of the database based on the values in the columns of the year. We keep the lines if their annual value is 2002, otherwise not.

1. How to select a panda data set based on the value of a column

One way to filter rows in pandas is to use Boolean expressions. We first create a boolean variable by taking the column we are interested in and checking if its value matches the specific value we want to select/save.

For example, filter the data frame or multiply the data frame based on the 2002 value. This condition gives a boolean variable which has True if the annual value is 2002, False different.

1

2

3

4

5

6

7

8

9

>is_2002 = Gapminder [year]]==2002

>print(is_2002.header())

0 Wrong answer

1 Wrong

2 Wrong

3 Wrong

4 False

We can then use this boolean variable to filter the data frame. After the modification, we find that the new data framework is much smaller.

1

2

3

4

>gapminder_2002 = Gapminder [is_2002]

>print(gapminder_2002.form)

(142, 6)

We managed to filter the panda database based on the column values. Here, all lines from year to year correspond to the state of affairs in 2002.

1

2

3

4

5

6

7

>print(gapminder_2002.head())

Year Country Pop Continental LifeEhr gpPerkap

10 Afghanistan 2002 25268405.0 Asia 42,129 726,734055

22 Albania 2002 3508512,0 Europe 75,651 4604,211737

34 Algeria 2002 31287142,0 Africa 70 994 5288 040382

46 Angola 2002 10866106,0 Africa 41 003 2773 287312

58 Argentina 2002 38331121,0 America 74 340 8797 640716

In the example above we used two steps: 1) create a boolean variable that meets the filter condition, 2) use the boolean variable to filter strings But we don’t really need to create and save a new boolean variable to do the filtering. Instead, we can directly specify a Boolean expression for a subset of the data frame per column value as follows

1

2

3

4

>gapminder_2002 = Gapminder [Gapminder[‘year’] ==2002.]

>print(gapminder_2002.form)

(142, 6)

How do you filter chains with the Pandace chain?

We can also use the panda sequence to access the frame column and select rows as in the previous example. Pandace chains make it easy to combine one Pandace control with another Pandace control or user-defined functions.

Here we use the Pandas function eq() and attribute it to a number of years to check elementary equality and filter the data according to 2002.

1

2

3

4

>gapminder_2002 = gapminder [gapminder.year.eq(2002)].

>print(gapminder_2002.form)

(142, 6)

In the example above, we checked the equality (year==2002) and stored the strings corresponding to a certain value. We can use any other plus and minus type comparison operator and create Boolean expressions to filter the panda’s data frame strings.

2. How do I select rows in the Panda Data Frame that do NOT have a specific column value?

Sometimes it may be necessary to store rows in the data frame cumulatively based on the values in the columns that do not match anything. Let’s filter our filler, of which the column of the year is not the same as in 2002. In principle we want to have data for all years except 2002.

1

2

3

4

5

>gapminder_not_2002 = gapminder [gapminder.year != 2002]

>gapminder_not_2002 = Gapminder [year]! =2002].

>gapminder_not_2002.

(1562, 6)

3. How do I select the catalogue rules for pandas with a column value NOT NA/NAN?

Often you will want to filter the panda data frame to save rows when the values in a certain column are NOT NA/NAN.

We can use the Pandas notnull() method to filter based on the NA/NAN values of the column.

1

2

>gapminder_no_NA = gapminder [gapminder.year.notnull()].

4. How do I select lines in the pandas folder from a list?

In the example above, we have also chosen lines based on a single value, i.e. the year == 2002. Often, however, we have to select multiple value strings that exist in a file or iterative list. For example, we need separate lines for the years [1952, 2002].

The Panda Data Frame isin() function allows us to select strings using a list or any iterative option. If we use isin() with a column, it just gives a boolean variable with True if the value matches and False if it does not.

1

2

3

years = [1952, 2007]

gapminder.year.isin(s)

We can use a boolean array to select strings, like before.

1

2

3

>gapminder_yearars= gapminder [gapminder.year.isin(years)]

> gapminder_year.form

(284, 6)

We can guarantee that our new data frame contains a line that only corresponds to the two years indicated in the list. Use the Single Pandas function to obtain unique values for the year column.

1

2

>gapminder_year.year.unique()

array([1952, 2007])

5. How do I select lines in the Panda directory based on NOT listed values?

We can also select rows based on column values that are not listed or based on any iterative data. We will create a boolean variable in the same way as before, but now we reject the boolean variable by proposing it with ~ . For example, to obtain the rows of data frames for bite reminders whose column values are not included in the list of continents, we use

1

2

3

4

>Continents = [Asia, Africa, North and South America, Europe].

>gapminder_ocean = gapminder [~gapminder.continent.isin(continents)].

> gapminder_Ocean.shape

(24,6)

The result would be a more compact database with Hapminder data for one continent in Oceania. We can test them again as before with the unique Pandas function. We’ll only see the continent of Oceania.

1

2

>gapminder_ocean.continent.unique()

array([‘Oceania’], d-type=object).

6. How to select Panda Catalogue lines using multiple conditions

We can combine different conditions by selecting the lines in the panda data frame with an operator. This allows us to combine these two conditions to obtain data for Oceania in 1952 and 2002.

1

2

gapminder [~gapminder.continent.isin(continents) &

[gapminder.jahr.isin(s)]

We will now have lines corresponding to the Oceanic continent in 1957 and 2007.

1

2

3

4

5

          Year Country Pop Continental LifeEhr gpPerkap

60 Australia 1952 8691212,0 Oceania 69 120 10039 59564

71 Australia 2007 20434176,0 Oceania 81 235 34435 36744

1092 New Zealand 1952 1994794,0 Oceania 69 390 10556,57566

1103 New Zealand 2007 4115771,0 Oceania 80 204 25185 00911

In this article we have seen different ways to filter panda data strings. There is one more thing. Read the message about using the Pandas query() function to select rows in the Pandas data frame.

  • How do I select the Pandace Dataphragma lines with the request function?

 

 

 pandas dataframe filter by column value like,pandas select columns by condition

You May Also Like