Using data catalogs¶
Catalogs provide a level of indexing that can greatly speed up data discovery. Therefore, usability is a priority. All catalogs generated by Catalog Builder are accompanied by a Intake-ESM compatable JSON file.
Example notebooks¶
We are collecting examples that use the Intake-ESM API with the catalogs generated by our catalog builder here. Please open an issue and contribute!
Community examples
How to ingest using Intake-ESM¶
Import needed packages based on what your python analysis needs. Only intake and intake-esm are necessary for data exploration with intake-esm package
import xarray as xr
import intake
import intake_esm
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline
Set collection file variable (col_url) to JSON path
We must provide Intake-ESM with a path to an ESM compatible collection file (JSON). This JSON establishes a link to the generated catalog.
col_url = "<path-to-JSON>"
#E.g: col_url = "cats/gfdl_test1.json" # The template we use for current testing and for MDTF is here https://github.com/aradhakrishnanGFDL/CatalogBuilder/blob/main/cats/gfdl_template.json
col = intake.open_esm_datastore(col_url)
Set search parameters
Search parameters can be set to find specific files. Here, we search for a file using keys such as the experiment name and modeling realm.
expname_filter = ['ESM4_1pctCO2_D1']
modeling_realm = 'atmos'
model_filter = 'ESM4'
variable_id_filter = "evap"
ens_filter = "r1i1p1f1"
frequency = "monthly"
chunk_freq = "5yr"
Search the catalog
Now, we execute our query:
cat = col.search(experiment_id=expname_filter,frequency=frequency,modeling_realm=modeling_realm,
source_id=model_filter,variable_id=variable_id_filter)
cat.df["path"][0]
Intake will return the path to the file(s) that match these search parameters.