Pandas fillna() – Fill in missing values

Summary: in this tutorial, you’ll learn how to use Pandas fillna() function to fill in NaN value positions with custom value.

Replace NaN with custom value

While you can filter out NaN values out of Pandas data structures, values that could be relevant can also be discarded along with the process.

Rather than get rid of al the NaN values, you can replace them with other numbers that make a better sense using fillna() function.

Suppose we have a simple DataFrame.

import pandas as pd
import numpy as np
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
                   "toy": [np.NaN, 'Batmobile', 'Bullwhip'],
                   "reputation": [np.NaN, "Gotham", np.NaN],
                   "movie": [np.NaN, 3, np.NaN],
                  })
df
nametoyreputationmovie
0AlfredNaNNaNNaN
1BatmanBatmobileGotham3.0
2CatwomanBullwhipNaNNaN

You can use df.fillna() with a single argument to replace any NaNs with the value passed. In the example below, we replace NaN with 0.

df.fillna(0)
nametoyreputationmovie
0Alfred000.0
1BatmanBatmobileGotham3.0
2CatwomanBullwhip00.0

Please note that fillna() returns a new Series or DataFrame with the NaNs replaced.

If you want to modify the existing data structure, you have to reassign the name to the returned result, or pass inplace=True option.

df = df.fillna(0) #assign the old name to the new result
# OR
df.fillna(0, inplace=True)

Replace NaN with empty string

In order to replace NaN values with empty/blank strings, you can pass an empty string to fillna().

df.fillna('')

Replace NaN with different value in each column

Passing a dictionary to fillna() will cause it to look up NaN values in each column and replace it with the value specified in the dictionary (if it’s found).

Suppose we want to replace NaN in column toy with notoy and in column reputation with noreputation. We will pass {"toy":"notoy", "reputation:"noreputation"} to fillna().

df.fillna({"toy":"notoy", "reputation":"noreputation"})
nametoyreputationmovie
0AlfrednotoynoreputationNaN
1BatmanBatmobileGotham3.0
2CatwomanBullwhipnoreputationNaN

Replace NaN with median value

With fillna you can do lots of other things with a little creativity.

Having known that series.mean() returns the average of all values in a Series, you might replace NaN with that value.

import pandas as pd
from numpy import nan
s = pd.Series([1, nan, 6, 9, nan, 14])
s.fillna(s.mean())
0     1.0
1     7.5
2     6.0
3     9.0
4     7.5
5    14.0
dtype: float64

Similarly, we can replace NaNs in a DataFrame, each NaN value will be replaced by the median value of the column it belongs to.

import pandas as pd
from numpy import nan
df = pd.DataFrame({'A' :  [ 0, 10, nan, 5],
                    'B' :  [2, 5, nan, 5],
                    'C' :  [3, 6, 10, 6]})
df.fillna(df.mean())
ABC
00.02.03
110.05.06
25.04.010
35.05.06
NaN values which are replaced are underlined

You can chain two mean() calls if you want the NaN values to be filled with the median of all values in the DataFrame.

import pandas as pd
from numpy import nan
df = pd.DataFrame({'A' :  [ 0, 10, nan, 5],
                    'B' :  [2, 5, nan, 5],
                    'C' :  [3, 6, 10, 6]})
mean = df.mean().mean()
df.fillna(mean)
ABC
00.0000002.0000003
110.0000005.0000006
25.0833335.08333310
35.0000005.0000006

Explanation : the first mean() returns a Series of median values of each column and the second one calculates the average of them all.

Summary

  • fillna() fills NaNs with custom specified values.
  • fillna() returns a new data structure by default, can be changed by passing inplace=True option.
  • fillna() can take in scalar (values), dict, Series, and even DataFrame.
Author: Thijmen I’m currently a SysAdmin located in the Netherlands. Every day I try to keep around a hundred users happy with their network connections and overall, tech-related issues. I also spend my spare time fiddling with web-based applications.

Leave a Comment