Pandas fillna() – Fill in missing values

Summary: in this tutorial, you’ll learn how to use Pandas fillna() function to fill in NaN value positions with custom value.

Replace NaN with custom value

While you can filter out NaN values out of Pandas data structures, values that could be relevant can also be discarded along with the process.

Rather than get rid of al the NaN values, you can replace them with other numbers that make a better sense using fillna() function.

Suppose we have a simple DataFrame.

import pandas as pd import numpy as np df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'], "toy": [np.NaN, 'Batmobile', 'Bullwhip'], "reputation": [np.NaN, "Gotham", np.NaN], "movie": [np.NaN, 3, np.NaN], }) df
Code language: Python (python)
nametoyreputationmovie
0AlfredNaNNaNNaN
1BatmanBatmobileGotham3.0
2CatwomanBullwhipNaNNaN

You can use df.fillna() with a single argument to replace any NaNs with the value passed. In the example below, we replace NaN with 0.

df.fillna(0)
Code language: Python (python)
nametoyreputationmovie
0Alfred000.0
1BatmanBatmobileGotham3.0
2CatwomanBullwhip00.0

Please note that fillna() returns a new Series or DataFrame with the NaNs replaced.

If you want to modify the existing data structure, you have to reassign the name to the returned result, or pass inplace=True option.

df = df.fillna(0) #assign the old name to the new result # OR df.fillna(0, inplace=True)
Code language: Python (python)

Replace NaN with empty string

In order to replace NaN values with empty/blank strings, you can pass an empty string to fillna().

df.fillna('')
Code language: Python (python)

Replace NaN with different value in each column

Passing a dictionary to fillna() will cause it to look up NaN values in each column and replace it with the value specified in the dictionary (if it’s found).

Suppose we want to replace NaN in column toy with notoy and in column reputation with noreputation. We will pass {"toy":"notoy", "reputation:"noreputation"} to fillna().

df.fillna({"toy":"notoy", "reputation":"noreputation"})
Code language: Python (python)
nametoyreputationmovie
0AlfrednotoynoreputationNaN
1BatmanBatmobileGotham3.0
2CatwomanBullwhipnoreputationNaN

Replace NaN with median value

With fillna you can do lots of other things with a little creativity.

Having known that series.mean() returns the average of all values in a Series, you might replace NaN with that value.

import pandas as pd from numpy import nan s = pd.Series([1, nan, 6, 9, nan, 14]) s.fillna(s.mean())
Code language: Python (python)
0 1.0 1 7.5 2 6.0 3 9.0 4 7.5 5 14.0 dtype: float64
Code language: Python (python)

Similarly, we can replace NaNs in a DataFrame, each NaN value will be replaced by the median value of the column it belongs to.

import pandas as pd from numpy import nan df = pd.DataFrame({'A' : [ 0, 10, nan, 5], 'B' : [2, 5, nan, 5], 'C' : [3, 6, 10, 6]}) df.fillna(df.mean())
Code language: Python (python)
ABC
00.02.03
110.05.06
25.04.010
35.05.06
NaN values which are replaced are underlined

You can chain two mean() calls if you want the NaN values to be filled with the median of all values in the DataFrame.

import pandas as pd from numpy import nan df = pd.DataFrame({'A' : [ 0, 10, nan, 5], 'B' : [2, 5, nan, 5], 'C' : [3, 6, 10, 6]}) mean = df.mean().mean() df.fillna(mean)
Code language: Python (python)
ABC
00.0000002.0000003
110.0000005.0000006
25.0833335.08333310
35.0000005.0000006

Explanation : the first mean() returns a Series of median values of each column and the second one calculates the average of them all.

Summary

  • fillna() fills NaNs with custom specified values.
  • fillna() returns a new data structure by default, can be changed by passing inplace=True option.
  • fillna() can take in scalar (values), dict, Series, and even DataFrame.

Leave a Comment