Pandas reindexing

Summary : in this tutorial, you’ll learn how to align an existing Series or DataFrame to new index labels using reindex().

Reindex a Series

As you already know, Index objects are immutable and cannot be changed once created.

But by reindexing, you can define a new index for an existing Series object using reindex() function.

Please note that reindex() returns a new Series with the values of the previous Series rearranged to the new labels.

import pandas as pd
s = pd.Series([8, 3, 9, 6], index=["apple", "banana", "cherry", "mango"])
s2 = s.reindex(["mango", "cherry", "raspberry", "apple"])
s2

will output

mango        6.0
cherry       9.0
raspberry    NaN
apple        8.0
dtype: float64

You can see that the order of the values has been changed, and banana has been removed. Also, raspberry index has been added without a value, so Pandas shows NaN.

Automatic filling of values when reindexing

For sequence-based data that needs to be ordered, users may want to do some kind of interpolation or filling of values when reindexing.

Pandas ffill and bfill allow us to forward-filling or backward-filling the values with existing data.

Let’s see an example. Suppose we have a Series missing values at index number 2 and 4.

import pandas as pd
s = pd.Series(["mango", "cherry", "raspberry", "apple"], index=[0,1,3,5])
s
0        mango
1       cherry
3    raspberry
5        apple
dtype: object

If we reindex that object to a full column, with method='ffill', missing values will be taken from the last nearest index number.

s.reindex([0,1,2,3,4,5], method='ffill')
0        mango
1       cherry
2       cherry
3    raspberry
4    raspberry
5        apple
dtype: object

Similarly, bfill tells Pandas to copy the value from the nearest previous one.

import pandas as pd
s = pd.Series(["mango", "cherry", "raspberry", "apple"], index=[0,1,3,5])
s.reindex([0,1,2,3,4,5], method='bfill')
0        mango
1       cherry
2    raspberry
3    raspberry
4        apple
5        apple
dtype: object

Reindex with DataFrame

On a DataFrame, reindex can either alter the rows or columns, or both.

df = pd.DataFrame({'month': [1, 4, 7, 10],
                   'year': [2012, 2014, 2013, 2014],
                   'sale': [55, 40, 84, 31]},
                  index=['a','b','d','e'])
df
monthyearsale
a1201255
b4201440
d7201384
e10201431

Passing a new sequence will cause the rows to be reindexed.

df = pd.DataFrame({'month': [1, 4, 7, 10],
                   'year': [2012, 2014, 2013, 2014],
                   'sale': [55, 40, 84, 31]},
                  index=['a','b','d','e'])
df.reindex(['a','b', 'c','d','e'])
monthyearsale
a1.02012.055.0
b4.02014.040.0
cNaNNaNNaN
d7.02013.084.0
e10.02014.031.0

Alternatively, columns can also be reindexed, with the columns option. We’ll get the sale column to the left side.

df = pd.DataFrame({'month': [1, 4, 7, 10],
                   'year': [2012, 2014, 2013, 2014],
                   'sale': [55, 40, 84, 31]},
                  index=['a','b','d','e'])
df.reindex(columns=['sale', 'month', 'year'])
salemonthyear
a5512012
b4042014
d8472013
e31102014

Reindexing both columns and rows of a DataFrame can be done easily with the same reindex() function.

The two following syntax is equivalent, but the former will not be supported in the future due to ambiguity.

df.reindex(['a','b','c','d','e'],['sale', 'month', 'year']) # will be dropped in the next Pandas releases
# OR
df.reindex(index=['a','b','c','d','e'],columns=['sale', 'month', 'year'])

Ensure you’ve passed named arguments (include index= and columns=) for future compatibility.

Author: Thijmen I’m currently a SysAdmin located in the Netherlands. Every day I try to keep around a hundred users happy with their network connections and overall, tech-related issues. I also spend my spare time fiddling with web-based applications.

Leave a Comment