Read JSON files with Pandas

Summary: in this tutorial, you’ll learn how to create a DataFrame directly from JSON files.

JSON is supported out of the box by many programming languages across various platforms, which makes it one of the most popular way to store datasets.

JSON-based data can be read with Pandas pd.read_json() function.

Read JSON file

In order to properly import data from a JSON file, the first thing you have to do is examine the contents of the JSON file.

Data can be stored in many different JSON structure or orient in Pandas terms.

Most of the time, the JSON is a list of dictionaries with the same keys.

        "name": "Afghanistan",
        "continent": "Asia",
        "area": 652.23,
        "population": " 25.500.100 "
        "name": "Albania",
        "continent": "Europe",
        "area": 28.748,
        "population": " 2.831.741 "
        "name": "Algeria",
        "continent": "Africa",
        "area": " 2.381.741 ",
        "population": " 37.100.000 "
        "name": "Andorra",
        "continent": "Europe",
        "area": 468,
        "population": 78.115
        "name": "Angola",
        "continent": "Africa",
        "area": " 1.246.700 ",
        "population": " 20.609.294 "

We can quickly import the above JSON with orient=records parameter.

import pandas as pd
dframe = pd.read_json("example.json", orient="records")



For other JSON files that has a deeply nested structure, like the New York Philharmonic Performance History data, you may want to use json_normalize to flatten it out and quickly load into Pandas.

Complex, deeply nested JSON
import json 
import pandas as pd 
from import json_normalize

# Load JSON into Python objects
with open('../input/raw_nyc_phil.json') as f:
    loaded = json.load(f)

# Get the actual array that contains data
actual_data = loaded['programs']

# Specify the actual data as the main array
# record_path is the children which contains the necessary field
# meta is the list of data from parent object that we want to include
works_data = json_normalize(data=actual_data, record_path='works', 
                                                  meta=['id', 'orchestra','programID', 'season'])


See also  Google Colab Introduction

52446*Beethoven, Ludwig vanHill, Ureli CorelliNaNNaN[]SYMPHONY NO. 5 IN C MINOR, OP.6738e072a7-8fc9-4f9a-8eac-3957905c0002New York Philharmonic38531842-43
18834*4Weber, Carl Maria VonTimm, Henry C.NaN“Ozean, du Ungeheuer” (Ocean, thou mighty mons…[{‘soloistName’: ‘Otto, Antoinette’, ‘soloistR…OBERON38e072a7-8fc9-4f9a-8eac-3957905c0002New York Philharmonic38531842-43
23642*Hummel, JohannNaNNaNNaN[{‘soloistName’: ‘Scharfenberg, William’, ‘sol…QUINTET, PIANO, D MINOR, OP. 7438e072a7-8fc9-4f9a-8eac-3957905c0002New York Philharmonic38531842-43

Read JSON from URL

Pandas read_json() can read the data directly from an URL, which automatically handles the process of making the request, validating data and error catching.

read_json() accepts a handful of URL schemes : http, ftp, s3, file. A local file could be read using its file URL : file://localhost/path/to/file.json.

import pandas as pd

url = ""
df = pd.read_json(url)

You can also read JSON files directly from Amazon S3 using s3:// URLs (you’ll have to install s3fs first).

import pandas as pd

df = pd.read_json("s3://bucket....csv")

Summary: using read_json() method, one can easily import the data from local JSON files as well as remotely located files.

Avatar photo
Author: Thijmen I’m currently a SysAdmin located in the Netherlands. Every day I try to keep around a hundred users happy with their network connections and overall, tech-related issues. I also spend my spare time fiddling with web-based applications.

Leave a Comment