Pandas dropna() – filter out empty rows and columns

Summary: in this tutorial, you’ll learn how to use Pandas dropna() function to remove rows and columns which contains NaN values.

Filter NaN from Series with dropna()

Doing missing data check manually using isnull()/notnull() and boolean indexing is time-consuming. A better way is to use Pandas dropna() to get a non-null only object.

On a Series, dropna() returns a Series with NaN value and index removed. See how you can use dropna() in the example below.

import pandas as pd from numpy import nan as NA df = pd.Series([1, NA, 3, NA, 7]) df.dropna()
Code language: Python (python)

The result is a new Series object

0 1.0 2 3.0 4 7.0 dtype: float64
Code language: Python (python)

Using notnull() as a boolean variable can also achieve the same result.

df[df.notnull()]
Code language: Python (python)

Drop DataFrames rows with dropna()

With DataFrame objects, you can choose between dropping all rows and colums that contains NaN values, or dropping only those that are all NaN.

The conditions on which dropna() determine which rows/columns to remove from the result DataFrame is its threshold.

Suppose we have a simple DataFrame.

import pandas as pd import numpy as np df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'], "toy": [np.NaN, 'Batmobile', 'Bullwhip'], "reputation": [np.NaN, np.NaN, np.NaN], "movie": [np.NaN, 3, 1], }) df
Code language: Python (python)
nametoyreputationmovie
0AlfredNaNNaNNaN
1BatmanBatmobileNaN3.0
2CatwomanBullwhipNaN1.0

By default, dropna drops any row which contains a NaN value.

df.dropna()
Code language: Python (python)
nametoyreputationmovie
1BatmanBatmobileGotham3.0

You can change the threshold to limit dropping on rows that are all NaN by passing how='all' option.

df.dropna(how='all')
Code language: Python (python)

name
toyreputationmovie
0AlfredNaNNaNNaN
1BatmanBatmobileGotham3.0
2CatwomanBullwhipNaNNaN

Drop DataFrames columns with dropna()

A DataFrame object is a two-dimentional data type, which means that its data spreads across two axis. The rows belongs to axis 0 and the columns belongs to axis 1.

In order to filter out DataFrame columns, pass axis=1 to dropna().

df.dropna(axis=1)
Code language: Python (python)
name
0Alfred
1Batman
2Catwoman

You can also combine how='all' with axis=1 to limit dropna() to all-NaN columns.

df.dropna(axis=1, how='all')
Code language: Python (python)

Please do note that dropna() returns a new DataFrame and does not modify the original DataFrame in place by default.

So if dropna does not work on your situation, please pass an additional inplace=True option into it.

Summary

  • dropna() is used to filter out rows and columns which contains NaN values.
  • dropna() returns a new DataFrame. Pass inplace=True to modify the original DataFrame in place.
  • By default, dropna() removes rows and columns that contains at least one NaN value.
  • dropna(axis=1) drops columns.
  • dropna() without axis option will drops rows only.
  • Change the threshold of dropna() by passing how argument.

Leave a Comment