Summary: in this tutorial, you’ll learn how to add rows to an existing DataFrame object.
Append a DataFrame to another
Appending DataFrames using append()
method is the basic way of adding rows to a DataFrame. The method returns a new DataFrame with the data from the original DataFrame added first, and the rows from the second. The result can contain duplicate index values as it does not perform alignment.
The example below appends two DataFrame objects with identical structure, taken from List of metro systems – Wikipedia and cleaned references and citations.
import pandas as pd
# Import data from cleaned HTML file
df = pd.read_html("wiki.html")[0]
# Create 2 DataFrames with the same structure
df1 = df.head(2)
df2 = df.tail(3)
# Append df2 to df1
result = df1.append(df2)
result
We now have the new DataFrame as follows.
City | Country | Name | Year opened | Year of last expansion | Stations | System length | Annual ridership(millions) | |
---|---|---|---|---|---|---|---|---|
Algiers | Algeria | Algiers Metro | 2011 | 2018 | 19 | 18.5 km (11.5 mi) | 45.3 (2019) | |
1 | Buenos Aires | Argentina | Buenos Aires Underground | 1927 | 2019 | 90 | 56.7 km (35.2 mi) | 321.3 (2019) |
191 | Washington, D.C. | United States | Washington Metro | 1976 | 2014 | 91 | 188 km (117 mi) | 68.1 (2020) |
192 | Tashkent | Uzbekistan | Tashkent Metro | 1977 | 2020 | 39 | 57.1 km (35.5 mi) | 71.2 (2019) |
193 | Caracas | Venezuela | Caracas Metro | 1983 | 2015 | 52 | 67.2 km (41.8 mi) | 358 (2017) |
Alternatively, if you have a list, you can also create a Series out of it, align it to the DataFrame index, then append()
.
import pandas as pd
# Import data from cleaned HTML file
df = pd.read_html("wiki.html")[0]
helsinki = ['Helsinki', 'Finland', 'Helsinki Metro', 1982, '2017',
25, '35\xa0km (22\xa0mi)', '92.6 (2019)']
# Create the Series, align to DataFrame columns
s = pd.Series(helsinki, index=df.columns.values)
# Set df to the result
# append() doesn't update existing DataFrame
df = df.append(s, ignore_index=True)
df
You can see in the output that the new row has been added to the end of the DataFrame.
City | Country | Name | Year opened | Year of last expansion | Stations | System length | Annual ridership(millions) | |
---|---|---|---|---|---|---|---|---|
Algiers | Algeria | Algiers Metro | 2011 | 2018 | 19 | 18.5 km (11.5 mi) | 45.3 (2019) | |
1 | Buenos Aires | Argentina | Buenos Aires Underground | 1927 | 2019 | 90 | 56.7 km (35.2 mi) | 321.3 (2019) |
… | … | … | … | … | … | … | … | … |
193 | Caracas | Venezuela | Caracas Metro | 1983 | 2015 | 52 | 67.2 km (41.8 mi) | 358 (2017) |
194 | Helsinki | Finland | Helsinki Metro | 1982 | 2017 | 25 | 35 km (22 mi) | 92.6 (2019) |
Add a row to the top of a DataFrame
While append()
can be used to add a new row to the end of a DataFrame, what if we want the new row to be on top of the others? Though not natively supported, the operation can be achieved in a few ways.
- Convert the new row into a DataFrame, then use
concat()
to concatenate it with the existing DataFrame (only works if your data is in the correct form). - Add a new row at position
-1
, add1
to all index so things starts fromall over again (with our new row at index
). Then sort the rows using sort_index with
inplace=True
so that the existing DataFrame gets updated instead of returning a new one.
Concatenate new row to existing DataFrame
If our data is in the correct form to be used to create a DataFrame, we can concatenate the two DataFrames, resetting the index.
import pandas as pd
# Import data from cleaned HTML file
df = pd.read_html("wiki.html")[0]
helsinki = {'Annual ridership(millions)': ['92.6 (2019)'],
'City': ['Helsinki'],
'Country': ['Finland'],
'Name': ['Helsinki Metro'],
'Stations': [25],
'System length': ['35\xa0km (22\xa0mi)'],
'Year of last expansion': ['2017'],
'Year opened': [1982]
}
# Create DataFrame out of raw Python data
helsinki_df = pd.DataFrame(helsinki)
# Concat with existing DataFrame and ignore the index.
df = pd.concat([helsinki_df, df], ignore_index=True)
df
You can already see that the input data is pretty complex Python dict
in order to create a DataFrame. In order to avoid manually creating that, we can use the second method described below.
Add row then sort by index in-place
What we’re going to do is adding a new row from a Python dictionary at index -1
using .loc
.
Then shift all of the index to the right, making our new row at index 0, all the others still in their original order.
After that, we sort the indexes with inplace=True
so that the existing DataFrame gets updated instead of returning a new one.
import pandas as pd
# Import data from cleaned HTML file
df = pd.read_html("wiki.html")[0]
helsinki = ['Helsinki', 'Finland', 'Helsinki Metro', 1982, '2017',
25, '35\xa0km (22\xa0mi)', '92.6 (2019)']
# Add row to the bottom, but with index -1
df.loc[-1] = helsinki
# Shift all index numbers
df.index = df.index + 1
# Sort the index to bring new row to top
df.sort_index(inplace=True)
df
The output is exactly what we wanted.
City | Country | Name | Year opened | Year of last expansion | Stations | System length | Annual ridership(millions) | |
---|---|---|---|---|---|---|---|---|
Helsinki | Finland | Helsinki Metro | 1982 | 2017 | 25 | 35 km (22 mi) | 92.6 (2019) | |
1 | Algiers | Algeria | Algiers Metro | 2011 | 2018 | 19 | 18.5 km (11.5 mi) | 45.3 (2019) |
2 | Buenos Aires | Argentina | Buenos Aires Underground | 1927 | 2019 | 90 | 56.7 km (35.2 mi) | 321.3 (2019) |
… | … | … | … | … | … | … | … | … |
194 | Caracas | Venezuela | Caracas Metro | 1983 | 2015 | 52 | 67.2 km (41.8 mi) | 358 (2017) |
Add a row to specific index
Combining the methods we’ve demonstrated earlier in this post, you can easily add a row anywhere in the DataFrame with a few steps (not the most efficient way though):
- Split out our existing DataFrame at the index position you want, into 2 new DataFrames, let’s assume
df1
anddf2
. - Add a new row to the bottom of
df1
or on top ofdf2.
- Shift the index of
df1
ordf2
accordingly, usingdf.index = df.index + 1
syntax. - Concatenate the two DataFrames back together.
Summary
- Adding rows to the existing DataFrame is achieveable using
concat()
andappend()
. - You should avoid doing these operation too much, because they are kind of hacky.
- Instead, add entries to your data first, either using an intermediary format like CSV or JSON, or Python native objects.
