Background¶

The catalog builder project is a “python community package ecosystem” that allows you to generate data catalogs compatible with intake-esm. Available as a Conda package.

See our Github repository here. We have contributing guidelines and code of conduct documented in our GitHub repo. We welcome your contributions.

Brief overview on data catalogs¶

Data catalogs enable “data discoverability” regardless of the data format (zarr, netcdf). We acknowledge the different community collaborations (Pangeo/ESGF Cloud Data working group) that led us to explore this further.

Data catalogs in this project have 3 components. One of those is the “intake-esm” API that makes use of the specifications and catalogs, generated by the catalog builder API. Read more about Intake-ESM here.

Catalog Specification

What we expect to find inside and how to open the “datasets”/objects?
Provides metadata about the catalog
Identifies how multiple files can be aggregated into a single “dataset”
Support for extensible metadata
Single JSON file

Catalogs

Tells us more about the data collection
Path to the files (objects), and associated metadata.
CSV file
User-defined granularity

Intake-ESM API

Opens possibilities to QUERY and ANALYZE
Provides a pythonic way to “query” for information in the catalogs
Loads the results in an xarray dataset object