Pandas dropna() – filter out empty rows and columns

Summary: in this tutorial, you’ll learn how to use Pandas dropna() function to remove rows and columns which contains NaN values.

Filter NaN from Series with dropna()

Doing missing data check manually using isnull()/notnull() and boolean indexing is time-consuming. A better way is to use Pandas dropna() to get a non-null only object.

On a Series, dropna() returns a Series with NaN value and index removed. See how you can use dropna() in the example below.

import pandas as pd
from numpy import nan as NA
df = pd.Series([1, NA, 3, NA, 7])
df.dropna()

The result is a new Series object

0    1.0
2    3.0
4    7.0
dtype: float64

Using notnull() as a boolean variable can also achieve the same result.

df[df.notnull()]

Drop DataFrames rows with dropna()

With DataFrame objects, you can choose between dropping all rows and colums that contains NaN values, or dropping only those that are all NaN.

The conditions on which dropna() determine which rows/columns to remove from the result DataFrame is its threshold.

Suppose we have a simple DataFrame.

import pandas as pd
import numpy as np
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
                   "toy": [np.NaN, 'Batmobile', 'Bullwhip'],
                   "reputation": [np.NaN, np.NaN, np.NaN],
                   "movie": [np.NaN, 3, 1],
                  })
df
nametoyreputationmovie
AlfredNaNNaNNaN
1BatmanBatmobileNaN3.0
2CatwomanBullwhipNaN1.0

By default, dropna drops any row which contains a NaN value.

df.dropna()
nametoyreputationmovie
1BatmanBatmobileGotham3.0

You can change the threshold to limit dropping on rows that are all NaN by passing how='all' option.

df.dropna(how='all')

name
toyreputationmovie
AlfredNaNNaNNaN
1BatmanBatmobileGotham3.0
2CatwomanBullwhipNaNNaN

Drop DataFrames columns with dropna()

A DataFrame object is a two-dimentional data type, which means that its data spreads across two axis. The rows belongs to axis 0 and the columns belongs to axis 1.

See also  How to use Jupyter Notebook

In order to filter out DataFrame columns, pass axis=1 to dropna().

df.dropna(axis=1)
name
Alfred
1Batman
2Catwoman

You can also combine how='all' with axis=1 to limit dropna() to all-NaN columns.

df.dropna(axis=1, how='all')

Please do note that dropna() returns a new DataFrame and does not modify the original DataFrame in place by default.

So if dropna does not work on your situation, please pass an additional inplace=True option into it.

Summary

  • dropna() is used to filter out rows and columns which contains NaN values.
  • dropna() returns a new DataFrame. Pass inplace=True to modify the original DataFrame in place.
  • By default, dropna() removes rows and columns that contains at least one NaN value.
  • dropna(axis=1) drops columns.
  • dropna() without axis option will drops rows only.
  • Change the threshold of dropna() by passing how argument.
Avatar photo
Author: Thijmen I’m currently a SysAdmin located in the Netherlands. Every day I try to keep around a hundred users happy with their network connections and overall, tech-related issues. I also spend my spare time fiddling with web-based applications.

Leave a Comment