Read JSON files with Pandas

Summary: in this tutorial, you’ll learn how to create a DataFrame directly from JSON files.

JSON is supported out of the box by many programming languages across various platforms, which makes it one of the most popular way to store datasets.

JSON-based data can be read with Pandas pd.read_json() function.

Read JSON file

In order to properly import data from a JSON file, the first thing you have to do is examine the contents of the JSON file.

Data can be stored in many different JSON structure or orient in Pandas terms.

Most of the time, the JSON is a list of dictionaries with the same keys.

[ { "name": "Afghanistan", "continent": "Asia", "area": 652.23, "population": " 25.500.100 " }, { "name": "Albania", "continent": "Europe", "area": 28.748, "population": " 2.831.741 " }, { "name": "Algeria", "continent": "Africa", "area": " 2.381.741 ", "population": " 37.100.000 " }, { "name": "Andorra", "continent": "Europe", "area": 468, "population": 78.115 }, { "name": "Angola", "continent": "Africa", "area": " 1.246.700 ", "population": " 20.609.294 " } ]
Code language: Python (python)

We can quickly import the above JSON with orient=records parameter.

import pandas as pd dframe = pd.read_json("example.json", orient="records") dframe
Code language: Python (python)

Output:

namecontinentareapopulation
0AfghanistanAsia652.2325.500.100
1AlbaniaEurope28.7482.831.741
2AlgeriaAfrica2.381.74137.100.000
3AndorraEurope46878.115
4AngolaAfrica1.246.70020.609.294

For other JSON files that has a deeply nested structure, like the New York Philharmonic Performance History data, you may want to use json_normalize to flatten it out and quickly load into Pandas.

Complex, deeply nested JSON
import json import pandas as pd from pandas.io.json import json_normalize # Load JSON into Python objects with open('../input/raw_nyc_phil.json') as f: loaded = json.load(f) # Get the actual array that contains data actual_data = loaded['programs'] # Specify the actual data as the main array # record_path is the children which contains the necessary field # meta is the list of data from parent object that we want to include works_data = json_normalize(data=actual_data, record_path='works', meta=['id', 'orchestra','programID', 'season']) works_data.head(3)
Code language: Python (python)

Output:


ID
composerNameconductorNameintervalmovementsoloistsworkTitleidorchestraprogramIDseason
052446*Beethoven, Ludwig vanHill, Ureli CorelliNaNNaN[]SYMPHONY NO. 5 IN C MINOR, OP.6738e072a7-8fc9-4f9a-8eac-3957905c0002New York Philharmonic38531842-43
18834*4Weber, Carl Maria VonTimm, Henry C.NaN“Ozean, du Ungeheuer” (Ocean, thou mighty mons…[{‘soloistName’: ‘Otto, Antoinette’, ‘soloistR…OBERON38e072a7-8fc9-4f9a-8eac-3957905c0002New York Philharmonic38531842-43
23642*Hummel, JohannNaNNaNNaN[{‘soloistName’: ‘Scharfenberg, William’, ‘sol…QUINTET, PIANO, D MINOR, OP. 7438e072a7-8fc9-4f9a-8eac-3957905c0002New York Philharmonic38531842-43

Read JSON from URL

Pandas read_json() can read the data directly from an URL, which automatically handles the process of making the request, validating data and error catching.

read_json() accepts a handful of URL schemes : http, ftp, s3, file. A local file could be read using its file URL : file://localhost/path/to/file.json.

import pandas as pd url = "https://api.exchangerate-api.com/v4/latest/USD" df = pd.read_json(url) print(df)
Code language: Python (python)

You can also read JSON files directly from Amazon S3 using s3:// URLs (you’ll have to install s3fs first).

import pandas as pd df = pd.read_json("s3://bucket....csv") print(df)
Code language: Python (python)

Summary: using read_json() method, one can easily import the data from local JSON files as well as remotely located files.

Leave a Comment