Summary: in this tutorial, you’ll learn how to create a DataFrame
directly from JSON files.
JSON is supported out of the box by many programming languages across various platforms, which makes it one of the most popular way to store datasets.
JSON-based data can be read with Pandas pd.read_json()
function.
Read JSON file
In order to properly import data from a JSON file, the first thing you have to do is examine the contents of the JSON file.
Data can be stored in many different JSON structure or orient in Pandas terms.
Most of the time, the JSON is a list of dictionaries with the same keys.
[
{
"name": "Afghanistan",
"continent": "Asia",
"area": 652.23,
"population": " 25.500.100 "
},
{
"name": "Albania",
"continent": "Europe",
"area": 28.748,
"population": " 2.831.741 "
},
{
"name": "Algeria",
"continent": "Africa",
"area": " 2.381.741 ",
"population": " 37.100.000 "
},
{
"name": "Andorra",
"continent": "Europe",
"area": 468,
"population": 78.115
},
{
"name": "Angola",
"continent": "Africa",
"area": " 1.246.700 ",
"population": " 20.609.294 "
}
]
We can quickly import the above JSON with orient=records
parameter.
import pandas as pd
dframe = pd.read_json("example.json", orient="records")
dframe
Output:
name | continent | area | population | |
---|---|---|---|---|
Afghanistan | Asia | 652.23 | 25.500.100 | |
1 | Albania | Europe | 28.748 | 2.831.741 |
2 | Algeria | Africa | 2.381.741 | 37.100.000 |
3 | Andorra | Europe | 468 | 78.115 |
4 | Angola | Africa | 1.246.700 | 20.609.294 |
For other JSON files that has a deeply nested structure, like the New York Philharmonic Performance History data, you may want to use json_normalize
to flatten it out and quickly load into Pandas.

import json
import pandas as pd
from pandas.io.json import json_normalize
# Load JSON into Python objects
with open('../input/raw_nyc_phil.json') as f:
loaded = json.load(f)
# Get the actual array that contains data
actual_data = loaded['programs']
# Specify the actual data as the main array
# record_path is the children which contains the necessary field
# meta is the list of data from parent object that we want to include
works_data = json_normalize(data=actual_data, record_path='works',
meta=['id', 'orchestra','programID', 'season'])
works_data.head(3)
Output:
ID | composerName | conductorName | interval | movement | soloists | workTitle | id | orchestra | programID | season | |
---|---|---|---|---|---|---|---|---|---|---|---|
52446* | Beethoven, Ludwig van | Hill, Ureli Corelli | NaN | NaN | [] | SYMPHONY NO. 5 IN C MINOR, OP.67 | 38e072a7-8fc9-4f9a-8eac-3957905c0002 | New York Philharmonic | 3853 | 1842-43 | |
1 | 8834*4 | Weber, Carl Maria Von | Timm, Henry C. | NaN | “Ozean, du Ungeheuer” (Ocean, thou mighty mons… | [{‘soloistName’: ‘Otto, Antoinette’, ‘soloistR… | OBERON | 38e072a7-8fc9-4f9a-8eac-3957905c0002 | New York Philharmonic | 3853 | 1842-43 |
2 | 3642* | Hummel, Johann | NaN | NaN | NaN | [{‘soloistName’: ‘Scharfenberg, William’, ‘sol… | QUINTET, PIANO, D MINOR, OP. 74 | 38e072a7-8fc9-4f9a-8eac-3957905c0002 | New York Philharmonic | 3853 | 1842-43 |
Read JSON from URL
Pandas read_json()
can read the data directly from an URL, which automatically handles the process of making the request, validating data and error catching.
read_json()
accepts a handful of URL schemes : http, ftp, s3, file. A local file could be read using its file URL : file://localhost/path/to/file.json
.
import pandas as pd
url = "https://api.exchangerate-api.com/v4/latest/USD"
df = pd.read_json(url)
print(df)
You can also read JSON files directly from Amazon S3 using s3://
URLs (you’ll have to install s3fs first).
import pandas as pd
df = pd.read_json("s3://bucket....csv")
print(df)
Summary: using read_json()
method, one can easily import the data from local JSON files as well as remotely located files.
