Using data catalogs

Catalogs provide a level of indexing that can greatly speed up data discovery. Therefore, usability is a priority. All catalogs generated by Catalog Builder are accompanied by a Intake-ESM compatable JSON file.

Example notebooks

We are collecting examples that use the Intake-ESM API with the catalogs generated by our catalog builder here. Please open an issue and contribute!

Community examples

How to ingest using Intake-ESM

Import needed packages based on what your python analysis needs. Only intake and intake-esm are necessary for data exploration with intake-esm package

import xarray as xr
import intake
import intake_esm
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline

Set collection file variable (col_url) to JSON path

We must provide Intake-ESM with a path to an ESM compatible collection file (JSON). This JSON establishes a link to the generated catalog.

col_url = "<path-to-JSON>"

#E.g: col_url = "cats/gfdl_test1.json" # The template we use for current testing and for MDTF is here https://github.com/aradhakrishnanGFDL/CatalogBuilder/blob/main/cats/gfdl_template.json

col = intake.open_esm_datastore(col_url)

Set search parameters

Search parameters can be set to find specific files. Here, we search for a file using keys such as the experiment name and modeling realm.

expname_filter = ['ESM4_1pctCO2_D1']
modeling_realm = 'atmos'
model_filter = 'ESM4'
variable_id_filter = "evap"
ens_filter = "r1i1p1f1"
frequency = "monthly"
chunk_freq = "5yr"

Search the catalog

Now, we execute our query:

cat = col.search(experiment_id=expname_filter,frequency=frequency,modeling_realm=modeling_realm,
                source_id=model_filter,variable_id=variable_id_filter)

cat.df["path"][0]

Intake will return the path to the file(s) that match these search parameters.