Summary : in this tutorial, you’ll learn how to align an existing Series or DataFrame to new index labels using reindex()
.
Reindex a Series
As you already know, Index objects are immutable and cannot be changed once created.
But by reindexing, you can define a new index for an existing Series
object using reindex()
function.
Please note that reindex()
returns a new Series with the values of the previous Series rearranged to the new labels.
import pandas as pd
s = pd.Series([8, 3, 9, 6], index=["apple", "banana", "cherry", "mango"])
s2 = s.reindex(["mango", "cherry", "raspberry", "apple"])
s2
will output
mango 6.0
cherry 9.0
raspberry NaN
apple 8.0
dtype: float64
You can see that the order of the values has been changed, and banana
has been removed. Also, raspberry
index has been added without a value, so Pandas shows NaN
.
Automatic filling of values when reindexing
For sequence-based data that needs to be ordered, users may want to do some kind of interpolation or filling of values when reindexing.
Pandas ffill
and bfill
allow us to forward-filling or backward-filling the values with existing data.
Let’s see an example. Suppose we have a Series missing values at index number 2 and 4.
import pandas as pd
s = pd.Series(["mango", "cherry", "raspberry", "apple"], index=[0,1,3,5])
s
0 mango
1 cherry
3 raspberry
5 apple
dtype: object
If we reindex that object to a full column, with method='ffill'
, missing values will be taken from the last nearest index number.
s.reindex([0,1,2,3,4,5], method='ffill')
0 mango
1 cherry
2 cherry
3 raspberry
4 raspberry
5 apple
dtype: object
Similarly, bfill
tells Pandas to copy the value from the nearest previous one.
import pandas as pd
s = pd.Series(["mango", "cherry", "raspberry", "apple"], index=[0,1,3,5])
s.reindex([0,1,2,3,4,5], method='bfill')
0 mango
1 cherry
2 raspberry
3 raspberry
4 apple
5 apple
dtype: object
Reindex with DataFrame
On a DataFrame, reindex can either alter the rows or columns, or both.
df = pd.DataFrame({'month': [1, 4, 7, 10],
'year': [2012, 2014, 2013, 2014],
'sale': [55, 40, 84, 31]},
index=['a','b','d','e'])
df
month | year | sale | |
---|---|---|---|
a | 1 | 2012 | 55 |
b | 4 | 2014 | 40 |
d | 7 | 2013 | 84 |
e | 10 | 2014 | 31 |
Passing a new sequence will cause the rows to be reindexed.
df = pd.DataFrame({'month': [1, 4, 7, 10],
'year': [2012, 2014, 2013, 2014],
'sale': [55, 40, 84, 31]},
index=['a','b','d','e'])
df.reindex(['a','b', 'c','d','e'])
month | year | sale | |
---|---|---|---|
a | 1.0 | 2012.0 | 55.0 |
b | 4.0 | 2014.0 | 40.0 |
c | NaN | NaN | NaN |
d | 7.0 | 2013.0 | 84.0 |
e | 10.0 | 2014.0 | 31.0 |
Alternatively, columns can also be reindexed, with the columns
option. We’ll get the sale
column to the left side.
df = pd.DataFrame({'month': [1, 4, 7, 10],
'year': [2012, 2014, 2013, 2014],
'sale': [55, 40, 84, 31]},
index=['a','b','d','e'])
df.reindex(columns=['sale', 'month', 'year'])
sale | month | year | |
---|---|---|---|
a | 55 | 1 | 2012 |
b | 40 | 4 | 2014 |
d | 84 | 7 | 2013 |
e | 31 | 10 | 2014 |
Reindexing both columns and rows of a DataFrame can be done easily with the same reindex()
function.
The two following syntax is equivalent, but the former will not be supported in the future due to ambiguity.
df.reindex(['a','b','c','d','e'],['sale', 'month', 'year']) # will be dropped in the next Pandas releases
# OR
df.reindex(index=['a','b','c','d','e'],columns=['sale', 'month', 'year'])
Ensure you’ve passed named arguments (include index=
and columns=
) for future compatibility.
