**Summary: **in this tutorial, you’re going to learn about how Pandas handle missing data, the NaN value and quick built-in functions to manipulate missing values.

Gathering or collecting data usually produces inconsistencies. Many potential problems can arise, including invalid, ambiguous, or missing values, and out-of-range data.

Pandas development team has acknowledge the problem and built in measures to make working with missing data as painless as possible.

## NaN value in Pandas

**For numerical values,** Pandas uses NaN – Not a Number, a ** floating-point **value to represent missing data. This is far from perfect, but it is functional, simple and works for most people.

Starting from pandas 1.0, some optional data types start experimenting with a native

`NA`

scalar using a mask-based approach.

NumPy ‘s `np.nan`

value in a Pandas data type will be marked as NaN and can be quickly verified using isnull() or notnull().

## None vs NaN

Python `None`

is treated as `NaN`

when the row values are all number-based types. If the other values in the row are strings or other types, Pandas will convert `None`

to the string `"None"`

.

While `None`

is a native Python object, `NaN`

is actually a part of NumPy.

```
import pandas as pd
import numpy as np
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
"toy": [np.NaN, 'Batmobile', 'Bullwhip'],
"reputation": [None, "Gotham", np.NaN], # Not numbers, so no NaN
"movie": [np.NaN, 3, None], # This none is NaN
})
df
```

name | toy | reputation | movie | |
---|---|---|---|---|

Alfred | NaN | None | NaN | |

1 | Batman | Batmobile | Gotham | 3.0 |

2 | Catwoman | Bullwhip | NaN | NaN |

## Different types of missing values compared

Below is an overview of all the popular values that should be treated as missing in Pandas.

- NaN is a NumPy built-in placeholder for missing values for any data type. NaN can be manually created using
`numpy.nan`

. - NA: Most of the time, NA comes from R code, where NA is an identifier for a missing value.
- NaT (Not a Timestamp) is equivalent to NaN, but for timestamp data points. NaT can also be created using
`numpy.nat`

. None: This represents missing values of data types other than numeric. `null`

: This originates when a function doesn’t return a value or if the value is undefined.`inf`

means infinity. It is a NumPy placefolder used when calculation returns an extremely large or small value. Often, we need to treat`inf`

as a missing value by manually specifying`pandas.options.mode.use_inf_as_na = True`

.

**Author: Thijmen**I’m currently a SysAdmin located in the Netherlands. Every day I try to keep around a hundred users happy with their network connections and overall, tech-related issues. I also spend my spare time fiddling with web-based applications.