.loc vs .iloc in Pandas

Summary: in this tutorial, you’ll learn about .loc, .iloc and .ix – the three ways of selecting row and column data in Pandas

There are three basic way of selecting data from rows and columns in Pandas.

.iloc – select by index number

.iloc selects rows and columns by their index numbers. The name is short for “integer location” or “index location”.

Each row has a number from 0 to the total rows called its “index”. Similarly, each column also has its index number.

The syntax is very simple : df.iloc[<row selection>, <column selection>]

# iloc usage in Pandas # Select first five rows of dataframe data.iloc[0:5] # Select first two columns of data frame with all rows data.iloc[:, 0:2] # Select 1st, 4th, 7th, 25th row + 1st 6th 7th columns. Remember index is counted from 0. data.iloc[[0,3,6,24], [0,5,6]] # Select first 5 rows and 5th, 6th, 7th columns of data frame. data.iloc[0:5, 5:8]
Code language: Python (python)

Important note :

  • The last row or column in the range will never be selected. For example : [3:9] will select from rows number 3 to 8, but not row number 9.
  • If you pass a single number into .iloc, the data returned will be a Series (which makes sense because it contains only one row). Multiple rows selected will turn into a DataFrame in the result instead of a Series. To ensure the result is always a DataFrame, pass a single-valued list into iloc instead of just a single number.

.loc – select by name or boolean vector

.loc is a label indexer that allows us to select rows and columns either by their names or boolean vectors.

.loc syntax is df.loc[]. Inside the brackets are the inputs, which could be either row_selection, column_selection or boolean vector.

The boolean vector syntax is the most useful one because it can quickly filter through a DataFrame to find what you need.

.loc by index number

.loc relies on the index of the DataFrame to perform selection (if there’s any).

On a DataFrame with default number-based index, you can select rows using their index number with .loc.

The example below reads a table (citations removed) into a DataFrame and select a few rows from that data.

import pandas as pd # Import data from cleaned HTML file df = pd.read_html("wiki.html")[0] # .loc can select row/rows using index number # Select one row df.loc[6] # Select multiple rows df.loc[[6,8,10]]
Code language: Python (python)

This result in another DataFrame which contains our desired rows only.

CityCountryNameYear openedYear of last expansionStationsSystem lengthAnnual ridership(millions)
6MinskBelarusMinsk Metro198420203340.8 km (25.4 mi)293.7 (2019)
8Belo HorizonteBrazilBelo Horizonte Metro198620021928.1 km (17.5 mi)58.4 (2018)
10Porto AlegreBrazilPorto Alegre Metro198520142243.8 km (27.2 mi)51.7 (2018)

.loc by labels

On a DataFrame with an index is set, .loc allows directly selecting based on index values of any rows.

In the following example, we’ve set City column to be the index, then select all metro based in New York.

import pandas as pd # Import data from cleaned HTML file df = pd.read_html("wiki.html")[0] # .loc can select row/rows using index number # Select using index label df.set_index("City", inplace=True) df.loc['New York City']
Code language: Python (python)

We’ll get a fresh DataFrame with what we needed.

CountryNameYear openedYear of last expansionStationsSystem lengthAnnual ridership(millions)
City
New York CityUnited StatesNew York City Subway19042017424399 km (248 mi)1697.8 (2019)
New York CityUnited StatesStaten Island Railway192520172122.5 km (14.0 mi)2.7 (2020)
New York CityUnited StatesPATH190819371322.2 km (13.8 mi)29.7 (2020)

But be aware that if only one row is found, a Series will be returned instead of a DataFrame.

In order to avoid this behaviour, pass a one-element list instead of just a string.

# df.loc['Boston'] returns a Series df.loc[['Boston']] # returns a DataFrame
Code language: Python (python)

.ix is .loc and .iloc combined, but deprecated

.ix is a combination of the two methods .loc and .iloc above.

Depending on the input, it will perform the appropriate operation. If the input is a non-integer label, it will behave like .loc, and if it is an integer, it will behave like .iloc.

.ix indexer has been deprecated since Pandas 0.2.

Leave a Comment