Pandas reindexing

Summary : in this tutorial, you’ll learn how to align an existing Series or DataFrame to new index labels using reindex().

Reindex a Series

As you already know, Index objects are immutable and cannot be changed once created.

But by reindexing, you can define a new index for an existing Series object using reindex() function.

Please note that reindex() returns a new Series with the values of the previous Series rearranged to the new labels.

import pandas as pd
s = pd.Series([8, 3, 9, 6], index=["apple", "banana", "cherry", "mango"])
s2 = s.reindex(["mango", "cherry", "raspberry", "apple"])
s2

will output

mango        6.0
cherry       9.0
raspberry    NaN
apple        8.0
dtype: float64

You can see that the order of the values has been changed, and banana has been removed. Also, raspberry index has been added without a value, so Pandas shows NaN.

Automatic filling of values when reindexing

For sequence-based data that needs to be ordered, users may want to do some kind of interpolation or filling of values when reindexing.

Pandas ffill and bfill allow us to forward-filling or backward-filling the values with existing data.

Let’s see an example. Suppose we have a Series missing values at index number 2 and 4.

import pandas as pd
s = pd.Series(["mango", "cherry", "raspberry", "apple"], index=[0,1,3,5])
s
0        mango
1       cherry
3    raspberry
5        apple
dtype: object

If we reindex that object to a full column, with method='ffill', missing values will be taken from the last nearest index number.

s.reindex([0,1,2,3,4,5], method='ffill')
0        mango
1       cherry
2       cherry
3    raspberry
4    raspberry
5        apple
dtype: object

Similarly, bfill tells Pandas to copy the value from the nearest previous one.

import pandas as pd
s = pd.Series(["mango", "cherry", "raspberry", "apple"], index=[0,1,3,5])
s.reindex([0,1,2,3,4,5], method='bfill')
0        mango
1       cherry
2    raspberry
3    raspberry
4        apple
5        apple
dtype: object

Reindex with DataFrame

On a DataFrame, reindex can either alter the rows or columns, or both.

df = pd.DataFrame({'month': [1, 4, 7, 10],
                   'year': [2012, 2014, 2013, 2014],
                   'sale': [55, 40, 84, 31]},
                  index=['a','b','d','e'])
df
monthyearsale
a1201255
b4201440
d7201384
e10201431

Passing a new sequence will cause the rows to be reindexed.

df = pd.DataFrame({'month': [1, 4, 7, 10],
                   'year': [2012, 2014, 2013, 2014],
                   'sale': [55, 40, 84, 31]},
                  index=['a','b','d','e'])
df.reindex(['a','b', 'c','d','e'])
monthyearsale
a1.02012.055.0
b4.02014.040.0
cNaNNaNNaN
d7.02013.084.0
e10.02014.031.0

Alternatively, columns can also be reindexed, with the columns option. We’ll get the sale column to the left side.

df = pd.DataFrame({'month': [1, 4, 7, 10],
                   'year': [2012, 2014, 2013, 2014],
                   'sale': [55, 40, 84, 31]},
                  index=['a','b','d','e'])
df.reindex(columns=['sale', 'month', 'year'])
salemonthyear
a5512012
b4042014
d8472013
e31102014

Reindexing both columns and rows of a DataFrame can be done easily with the same reindex() function.

See also  Import HTML tables into DataFrame

The two following syntax is equivalent, but the former will not be supported in the future due to ambiguity.

df.reindex(['a','b','c','d','e'],['sale', 'month', 'year']) # will be dropped in the next Pandas releases
# OR
df.reindex(index=['a','b','c','d','e'],columns=['sale', 'month', 'year'])

Ensure you’ve passed named arguments (include index= and columns=) for future compatibility.

Avatar photo
Author: Thijmen I’m currently a SysAdmin located in the Netherlands. Every day I try to keep around a hundred users happy with their network connections and overall, tech-related issues. I also spend my spare time fiddling with web-based applications.

Leave a Comment