Select DataFrame rows

Summary: in this tutorial, you’ll learn how to select rows from a DataFrame object.

Select the first n rows

head is the built-in method to quickly select the first n rows from a DataFrame object. This is particularly useful if you’re wondering whether you have the right type of data inside the object.

By default, head returns the first 5 rows from the DataFrame.

import pandas as pd url = 'https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/' df = pd.read_html(url) df[0].head()
Code language: Python (python)

Output :

Bank NameBankCityCityStateStCertCertAcquiring InstitutionAIClosing DateClosingFundFund
0Almena State BankAlmenaKS15426Equity BankOctober 23, 202010538
1First City Bank of FloridaFort Walton BeachFL16748United Fidelity Bank, fsbOctober 16, 202010537
2The First State BankBarboursvilleWV14361MVB Bank, Inc.April 3, 202010536
3Ericson State BankEricsonNE18265Farmers and Merchants BankFebruary 14, 202010535
4City National Bank of New JerseyNewarkNJ21111Industrial BankNovember 1, 201910534

If you want more than the first 5 rows, pass a number to head.

df.head(10) # Get first 10 rows from a DataFrame
Code language: Python (python)

Interestingly, you can pass a negative number to head. In this case, the method trims the last n rows and return the remaining part of the DataFrame, equivalent to df[:-n].

import pandas as pd url = 'https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/' df = pd.read_html(url) # Trim 555 rows from the bottom of the object # and return the remaining rows df[0].head(-555)
Code language: Python (python)

We’ll get the first 7 rows as the output, 555 rows from the last rows has been trimmed out.

Bank NameBankCityCityStateStCertCertAcquiring InstitutionAIClosing DateClosingFundFund
0Almena State BankAlmenaKS15426Equity BankOctober 23, 202010538
1First City Bank of FloridaFort Walton BeachFL16748United Fidelity Bank, fsbOctober 16, 202010537
2The First State BankBarboursvilleWV14361MVB Bank, Inc.April 3, 202010536
3Ericson State BankEricsonNE18265Farmers and Merchants BankFebruary 14, 202010535
4City National Bank of New JerseyNewarkNJ21111Industrial BankNovember 1, 201910534
5Resolute BankMaumeeOH58317Buckeye State BankOctober 25, 201910533
6Louisa Community BankLouisaKY58112Kentucky Farmers Bank CorporationOctober 25, 201910532
7The Enloe State BankCooperTX10716Legend Bank, N. A.May 31, 201910531

Select the last n rows

Contrary to the head to get rows from the top, Pandas has a built-in tail method to select from the last rows upwards.

import pandas as pd url = 'https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/' df = pd.read_html(url) df[0].tail()
Code language: Python (python)
Bank NameBankCityCityStateStCertCertAcquiring InstitutionAIClosing DateClosingFundFund
558Superior Bank, FSBHinsdaleIL32646Superior Federal, FSBJuly 27, 20016004
559Malta National BankMaltaOH6629North Valley BankMay 3, 20014648
560First Alliance Bank & Trust Co.ManchesterNH34264Southern New Hampshire Bank & TrustFebruary 2, 20014647
561National State Bank of MetropolisMetropolisIL3815Banterra Bank of MarionDecember 14, 20004646
562Bank of HonoluluHonoluluHI21029Bank of the OrientOctober 13, 20004645

Similarly, passing a number n to the method gets us the last n elements from the DataFrame object. If a negative number is used, this method returns all rows except the first n rows.

import pandas as pd url = 'https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/' df = pd.read_html(url) # Get the last 10 rows df[0].tail(10)
Code language: Python (python)

Select rows by index

Just like how Python list slicing works, you can extract a range of rows from anywhere in the middle of the DataFrame using row indexes. For example.

import pandas as pd url = 'https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/' df_list = pd.read_html(url) df = df_list[0] # Get 5 rows from rows number 10 df[10:15]
Code language: Python (python)
Bank NameBankCityCityStateStCertCertAcquiring InstitutionAIClosing DateClosingFundFund
10Fayette County BankSaint ElmoIL1802United Fidelity Bank, fsbMay 26, 201710528
11Guaranty Bank, (d/b/a BestBank in Georgia & Mi…MilwaukeeWI30003First-Citizens Bank & Trust CompanyMay 5, 201710527
12First NBC BankNew OrleansLA58302Whitney BankApril 28, 201710526
13Proficio BankCottonwood HeightsUT35495Cache Valley BankMarch 3, 201710525
14Seaway Bank and Trust CompanyChicagoIL19328State Bank of TexasJanuary 27, 201710524

An important thing to keep in mind that the method is that this will include row #10 and exclude row #15.
Similarly, df[:] will returns all rows in the DataFrame.

Select rows based on one condition

In order to select all rows that meet a specific condition, we’ll make use of Pandas Boolean Index.

If reading through the documentation doesn’t make sense to you, don’t worry, an example will clarify the concept.

In the example below, we need to select all rows that have Cert lower than 500.

# Create a DataFrame from FDIC data import pandas as pd url = 'https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/' df_list = pd.read_html(url) df = df_list[0] # Select based on condition low_cert = df['CertCert'] <= 500 df[low_cert]
Code language: Python (python)

We’ll get the output we wanted:

Bank NameBankCityCityStateStCertCertAcquiring InstitutionAIClosing DateClosingFundFund
16Allied BankMulberryAR91Today’s BankSeptember 23, 201610522
97First Capital BankKingfisherOK416F & M BankJune 8, 201210443
195Habersham BankClarkesvilleGA151SCBT National AssociationFebruary 18, 201110345
248The Peoples BankWinderGA182Community & Southern BankSeptember 17, 201010292

Select rows based on multiple condition

We can combine unlimited number of boolean indexes to filter data from DataFrame quickly and easily.

Let’s suppose we want to get all failed bank that meet two criteria : has Cert higher than 5000 and based in New York.

# Create a DataFrame from FDIC data import pandas as pd url = 'https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/' df_list = pd.read_html(url) df = df_list[0] # Select based on multiple condition low_cert = df['CertCert'] > 5000 ny_based = df['CityCity'] == "New York" df[low_cert & ny_based]
Code language: Python (python)

And voila, the result comes.

Bank NameBankCityCityStateStCertCertAcquiring InstitutionAIClosing DateClosingFundFund
343The Park Avenue BankNew YorkNY27096Valley National BankMarch 12, 201010195
344LibertyPointe BankNew YorkNY58071Valley National BankMarch 11, 201010194

Summary

  • df.head(n) returns first n rows from a DataFrame.
  • df.tail(n) returns last n rows from a DataFrame.
  • df[x:y] returns rows from #x to #(y-1) (row number y is not included).
  • You can select rows based on multiple condition using boolean indexes.
  • df[criteria] with criteria being a boolean variable returns rows that met the criteria.

Leave a Comment