Add DataFrame rows

Summary: in this tutorial, you’ll learn how to add rows to an existing DataFrame object.

Append a DataFrame to another

Appending DataFrames using append() method is the basic way of adding rows to a DataFrame. The method returns a new DataFrame with the data from the original DataFrame added first, and the rows from the second. The result can contain duplicate index values as it does not perform alignment.

The example below appends two DataFrame objects with identical structure, taken from List of metro systems – Wikipedia and cleaned references and citations.

import pandas as pd # Import data from cleaned HTML file df = pd.read_html("wiki.html")[0] # Create 2 DataFrames with the same structure df1 = df.head(2) df2 = df.tail(3) # Append df2 to df1 result = df1.append(df2) result
Code language: Python (python)

We now have the new DataFrame as follows.

CityCountryNameYear openedYear of last expansionStationsSystem lengthAnnual ridership(millions)
0AlgiersAlgeriaAlgiers Metro201120181918.5 km (11.5 mi)45.3 (2019)
1Buenos AiresArgentinaBuenos Aires Underground192720199056.7 km (35.2 mi)321.3 (2019)
191Washington, D.C.United StatesWashington Metro1976201491188 km (117 mi)68.1 (2020)
192TashkentUzbekistanTashkent Metro197720203957.1 km (35.5 mi)71.2 (2019)
193CaracasVenezuelaCaracas Metro198320155267.2 km (41.8 mi)358 (2017)

Alternatively, if you have a list, you can also create a Series out of it, align it to the DataFrame index, then append().

import pandas as pd # Import data from cleaned HTML file df = pd.read_html("wiki.html")[0] helsinki = ['Helsinki', 'Finland', 'Helsinki Metro', 1982, '2017', 25, '35\xa0km (22\xa0mi)', '92.6 (2019)'] # Create the Series, align to DataFrame columns s = pd.Series(helsinki, index=df.columns.values) # Set df to the result # append() doesn't update existing DataFrame df = df.append(s, ignore_index=True) df
Code language: Python (python)

You can see in the output that the new row has been added to the end of the DataFrame.

CityCountryNameYear openedYear of last expansionStationsSystem lengthAnnual ridership(millions)
0AlgiersAlgeriaAlgiers Metro201120181918.5 km (11.5 mi)45.3 (2019)
1Buenos AiresArgentinaBuenos Aires Underground192720199056.7 km (35.2 mi)321.3 (2019)
193CaracasVenezuelaCaracas Metro198320155267.2 km (41.8 mi)358 (2017)
194HelsinkiFinlandHelsinki Metro198220172535 km (22 mi)92.6 (2019)

Add a row to the top of a DataFrame

While append() can be used to add a new row to the end of a DataFrame, what if we want the new row to be on top of the others? Though not natively supported, the operation can be achieved in a few ways.

  • Convert the new row into a DataFrame, then use concat() to concatenate it with the existing DataFrame (only works if your data is in the correct form).
  • Add a new row at position -1, add 1 to all index so things starts from 0 all over again (with our new row at index 0). Then sort the rows using sort_index with inplace=True so that the existing DataFrame gets updated instead of returning a new one.

Concatenate new row to existing DataFrame

If our data is in the correct form to be used to create a DataFrame, we can concatenate the two DataFrames, resetting the index.

import pandas as pd # Import data from cleaned HTML file df = pd.read_html("wiki.html")[0] helsinki = {'Annual ridership(millions)': ['92.6 (2019)'], 'City': ['Helsinki'], 'Country': ['Finland'], 'Name': ['Helsinki Metro'], 'Stations': [25], 'System length': ['35\xa0km (22\xa0mi)'], 'Year of last expansion': ['2017'], 'Year opened': [1982] } # Create DataFrame out of raw Python data helsinki_df = pd.DataFrame(helsinki) # Concat with existing DataFrame and ignore the index. df = pd.concat([helsinki_df, df], ignore_index=True) df
Code language: Python (python)

You can already see that the input data is pretty complex Python dict in order to create a DataFrame. In order to avoid manually creating that, we can use the second method described below.

Add row then sort by index in-place

What we’re going to do is adding a new row from a Python dictionary at index -1 using .loc.

Then shift all of the index to the right, making our new row at index 0, all the others still in their original order.

After that, we sort the indexes with inplace=True so that the existing DataFrame gets updated instead of returning a new one.

import pandas as pd # Import data from cleaned HTML file df = pd.read_html("wiki.html")[0] helsinki = ['Helsinki', 'Finland', 'Helsinki Metro', 1982, '2017', 25, '35\xa0km (22\xa0mi)', '92.6 (2019)'] # Add row to the bottom, but with index -1 df.loc[-1] = helsinki # Shift all index numbers df.index = df.index + 1 # Sort the index to bring new row to top df.sort_index(inplace=True) df
Code language: Python (python)

The output is exactly what we wanted.


City
CountryNameYear openedYear of last expansionStationsSystem lengthAnnual ridership(millions)
0HelsinkiFinlandHelsinki Metro198220172535 km (22 mi)92.6 (2019)
1AlgiersAlgeriaAlgiers Metro201120181918.5 km (11.5 mi)45.3 (2019)
2Buenos AiresArgentinaBuenos Aires Underground192720199056.7 km (35.2 mi)321.3 (2019)
194CaracasVenezuelaCaracas Metro198320155267.2 km (41.8 mi)358 (2017)

Add a row to specific index

Combining the methods we’ve demonstrated earlier in this post, you can easily add a row anywhere in the DataFrame with a few steps (not the most efficient way though):

  • Split out our existing DataFrame at the index position you want, into 2 new DataFrames, let’s assume df1 and df2.
  • Add a new row to the bottom of df1 or on top of df2.
  • Shift the index of df1 or df2 accordingly, using df.index = df.index + 1 syntax.
  • Concatenate the two DataFrames back together.

Summary

Leave a Comment