Pandas reindexing

Summary : in this tutorial, you’ll learn how to align an existing Series or DataFrame to new index labels using reindex().

Reindex a Series

As you already know, Index objects are immutable and cannot be changed once created.

But by reindexing, you can define a new index for an existing Series object using reindex() function.

Please note that reindex() returns a new Series with the values of the previous Series rearranged to the new labels.

import pandas as pd s = pd.Series([8, 3, 9, 6], index=["apple", "banana", "cherry", "mango"]) s2 = s.reindex(["mango", "cherry", "raspberry", "apple"]) s2
Code language: Python (python)

will output

mango 6.0 cherry 9.0 raspberry NaN apple 8.0 dtype: float64
Code language: Python (python)

You can see that the order of the values has been changed, and banana has been removed. Also, raspberry index has been added without a value, so Pandas shows NaN.

Automatic filling of values when reindexing

For sequence-based data that needs to be ordered, users may want to do some kind of interpolation or filling of values when reindexing.

Pandas ffill and bfill allow us to forward-filling or backward-filling the values with existing data.

Let’s see an example. Suppose we have a Series missing values at index number 2 and 4.

import pandas as pd s = pd.Series(["mango", "cherry", "raspberry", "apple"], index=[0,1,3,5]) s
Code language: Python (python)
0 mango 1 cherry 3 raspberry 5 apple dtype: object
Code language: Python (python)

If we reindex that object to a full column, with method='ffill', missing values will be taken from the last nearest index number.

s.reindex([0,1,2,3,4,5], method='ffill')
Code language: Python (python)
0 mango 1 cherry 2 cherry 3 raspberry 4 raspberry 5 apple dtype: object
Code language: Python (python)

Similarly, bfill tells Pandas to copy the value from the nearest previous one.

import pandas as pd s = pd.Series(["mango", "cherry", "raspberry", "apple"], index=[0,1,3,5]) s.reindex([0,1,2,3,4,5], method='bfill')
Code language: Python (python)
0 mango 1 cherry 2 raspberry 3 raspberry 4 apple 5 apple dtype: object
Code language: Python (python)

Reindex with DataFrame

On a DataFrame, reindex can either alter the rows or columns, or both.

df = pd.DataFrame({'month': [1, 4, 7, 10], 'year': [2012, 2014, 2013, 2014], 'sale': [55, 40, 84, 31]}, index=['a','b','d','e']) df
Code language: Python (python)
monthyearsale
a1201255
b4201440
d7201384
e10201431

Passing a new sequence will cause the rows to be reindexed.

df = pd.DataFrame({'month': [1, 4, 7, 10], 'year': [2012, 2014, 2013, 2014], 'sale': [55, 40, 84, 31]}, index=['a','b','d','e']) df.reindex(['a','b', 'c','d','e'])
Code language: Python (python)
monthyearsale
a1.02012.055.0
b4.02014.040.0
cNaNNaNNaN
d7.02013.084.0
e10.02014.031.0

Alternatively, columns can also be reindexed, with the columns option. We’ll get the sale column to the left side.

df = pd.DataFrame({'month': [1, 4, 7, 10], 'year': [2012, 2014, 2013, 2014], 'sale': [55, 40, 84, 31]}, index=['a','b','d','e']) df.reindex(columns=['sale', 'month', 'year'])
Code language: Python (python)
salemonthyear
a5512012
b4042014
d8472013
e31102014

Reindexing both columns and rows of a DataFrame can be done easily with the same reindex() function.

The two following syntax is equivalent, but the former will not be supported in the future due to ambiguity.

df.reindex(['a','b','c','d','e'],['sale', 'month', 'year']) # will be dropped in the next Pandas releases # OR df.reindex(index=['a','b','c','d','e'],columns=['sale', 'month', 'year'])
Code language: Python (python)

Ensure you’ve passed named arguments (include index= and columns=) for future compatibility.

Leave a Comment