Summary: in this tutorial, you’ll learn about .loc
, .iloc
and .ix
– the three ways of selecting row and column data in Pandas
There are three basic way of selecting data from rows and columns in Pandas.
.iloc – select by index number
.iloc
selects rows and columns by their index numbers. The name is short for “integer location” or “index location”.
Each row has a number from 0 to the total rows called its “index”. Similarly, each column also has its index number.
The syntax is very simple : df.iloc[<row selection>, <column selection>]
# iloc usage in Pandas
# Select first five rows of dataframe
data.iloc[0:5]
# Select first two columns of data frame with all rows
data.iloc[:, 0:2]
# Select 1st, 4th, 7th, 25th row + 1st 6th 7th columns. Remember index is counted from 0.
data.iloc[[0,3,6,24], [0,5,6]]
# Select first 5 rows and 5th, 6th, 7th columns of data frame.
data.iloc[0:5, 5:8]
Important note :
- The last row or column in the range will never be selected. For example : [3:9] will select from rows number 3 to 8, but not row number 9.
- If you pass a single number into
.iloc
, the data returned will be aSeries
(which makes sense because it contains only one row). Multiple rows selected will turn into a DataFrame in the result instead of a Series. To ensure the result is always a DataFrame, pass a single-valued list intoiloc
instead of just a single number.
.loc – select by name or boolean vector
.loc
is a label indexer that allows us to select rows and columns either by their names or boolean vectors.
.loc
syntax is df.loc[]
. Inside the brackets are the inputs, which could be either row_selection, column_selection
or boolean vector
.
The boolean vector syntax is the most useful one because it can quickly filter through a DataFrame to find what you need.
.loc by index number
.loc
relies on the index of the DataFrame to perform selection (if there’s any).
On a DataFrame with default number-based index, you can select rows using their index number with .loc
.
The example below reads a table (citations removed) into a DataFrame and select a few rows from that data.
import pandas as pd
# Import data from cleaned HTML file
df = pd.read_html("wiki.html")[0]
# .loc can select row/rows using index number
# Select one row
df.loc[6]
# Select multiple rows
df.loc[[6,8,10]]
This result in another DataFrame which contains our desired rows only.
City | Country | Name | Year opened | Year of last expansion | Stations | System length | Annual ridership(millions) | |
---|---|---|---|---|---|---|---|---|
6 | Minsk | Belarus | Minsk Metro | 1984 | 2020 | 33 | 40.8 km (25.4 mi) | 293.7 (2019) |
8 | Belo Horizonte | Brazil | Belo Horizonte Metro | 1986 | 2002 | 19 | 28.1 km (17.5 mi) | 58.4 (2018) |
10 | Porto Alegre | Brazil | Porto Alegre Metro | 1985 | 2014 | 22 | 43.8 km (27.2 mi) | 51.7 (2018) |
.loc by labels
On a DataFrame with an index is set, .loc
allows directly selecting based on index values of any rows.
In the following example, we’ve set City column to be the index, then select all metro based in New York.
import pandas as pd
# Import data from cleaned HTML file
df = pd.read_html("wiki.html")[0]
# .loc can select row/rows using index number
# Select using index label
df.set_index("City", inplace=True)
df.loc['New York City']
We’ll get a fresh DataFrame with what we needed.
Country | Name | Year opened | Year of last expansion | Stations | System length | Annual ridership(millions) | |
---|---|---|---|---|---|---|---|
City | |||||||
New York City | United States | New York City Subway | 1904 | 2017 | 424 | 399 km (248 mi) | 1697.8 (2019) |
New York City | United States | Staten Island Railway | 1925 | 2017 | 21 | 22.5 km (14.0 mi) | 2.7 (2020) |
New York City | United States | PATH | 1908 | 1937 | 13 | 22.2 km (13.8 mi) | 29.7 (2020) |
But be aware that if only one row is found, a Series will be returned instead of a DataFrame.

In order to avoid this behaviour, pass a one-element list instead of just a string.
# df.loc['Boston'] returns a Series
df.loc[['Boston']] # returns a DataFrame
.ix is .loc and .iloc combined, but deprecated
.ix is a combination of the two methods .loc and .iloc above.
Depending on the input, it will perform the appropriate operation. If the input is a non-integer label, it will behave like .loc, and if it is an integer, it will behave like .iloc.
.ix indexer has been deprecated since Pandas 0.2.
