Introduction to netCDF

Overview

Teaching: 15 min
Exercises: 0 min
Questions
  • What is NetCDF format?

  • Why using Xarray for NetCDF files in Python

Objectives
  • Undertanding how NetCDF is stored.

  • Describe the components of a NetCDF file.

What is NetCDF?

These lessons work with raster or “gridded” data that are stored as a uniform grid of values using the netCDF file format. This is the most common data format and file type in the atmosphere and ocean sciences; essentially all output from weather, climate and ocean models is gridded data stored as a series of netCDF files. Satellite data is also often provided in NetCDF format.

Network Common Data Form (NetCDF) files are in binary format that are platform independent and self-describing (files contain a header and file metadata in the form of name/value attributes). This file format was developed by the Unidata project at the University Corporation for Atmospheric Research (UCAR).

Advantages

Storage of NetCDF data

The data in a netCDF file is stored in the form of arrays. The data stored in an array needs to be of the same type (homogeneous).

Temperature varying over time at a location is stored as a one-dimensional array. You can think of it as a list containing elements of the same data type (i.e. integers, floats).

An example of a 2-dimensional array is temperature over an area for a given time. A Pandas DataFrame is also a 2-dimensional data structure, but it differs from an array: a DataFrame can store hetergenous data elements, and you can access it as a spreadsheet (using the columnnames and rows).

1D_2D

Three-dimensional (3D) data, like temperature over an area varying with time. Think of this as a Pandas DataFrame where the “columns” (variables) have more than one dimension.
3D

Four-dimensional (4D) data, like temperature over an area varying with time and altitude, is stored as a series of two-dimensional arrays.
3D

Basic components of a NetCDF file

A netCDF file contains dimensions, variables, and attributes. These components are used together to capture the meaning of data and relations among data fields in an array-oriented dataset. The following figure shows the structure of a netCDF file using the CDL (network Common Data form Language) notation. CDL is the ASCII format used to describe the content of a NetCDF file.

netcdf

Dimensions

A NetCDF dimension is a named integer used to specify the shape of one or more of the multi-dimensional variables contained in a netCDF file. A dimension may be used to represent a real physical dimension, for example, time, latitude, longitude, or height; or more abstract quantities like station or model-run ID.

Every NetCDF dimension has both a name and a size.

Variables

A variable represents an array of values of the same type. Variables are used to store the bulk of the data in a netCDF file. A variable has a name, data type, and shape described by its list of dimensions specified when the variable is created. The number of dimensions is the rank (also known as dimensionality). A scalar variable has a rank of 0, a vector has a rank of 1, and a matrix has a rank of 2. A variable can also have associated attributes that can be added, deleted, or changed after the variable is created.
Examples of variables are: temperature, salinity, oxygen, etc.

Coordinate variables

A one-dimensional variable with the same name as a dimension is a coordinate variable. It is associated with a dimension of one or more data variables and typically defines a physical coordinate corresponding to that dimension. 2D coordinate fiels will not be defined as dimensions.

Coordinate variables have no special meaning to the netCDF library. However, the software using this library should handle coordinate variables in a specialized way.

Attributes

NetCDF attributes are used to store ancillary data or metadata. Most attributes provide information about a specific variable. These attributes are identified by the name of the variable together with the name of the attribute.

Attributes that provide information about the entire netCDF file are global attributes. These attributes are identified by the attribute name together with a blank variable name (in CDL) or a special null variable ID (in C or Fortran).

What tools to use NetCDF with

Setting up a Notebook and loading NetCDF data using Python libraries is not the only way of accessing these data. Other tools are:

Command Line Interfaces

GUI Interfaces

CMIP data

This is the dataset that we will be using for our class and is very known and widely used by oceanographers. The dataset is regular, linear, gridded data. It represent the mean monthly prescipitation flux (kg m-2 s-1) on a global scale.

CMIP (Coupled Model Intercomparison Projec) provides a community-based infrastructure in support of climate model diagnosis, validation, intercomparison, documentation and data access. This framework enables a diverse community of scientists to analyze GCMs in a systematic fashion, a process which serves to facilitate model improvement.

The acronym GCM originally stood for General Circulation Model. Recently, a second meaning came into use, namely Global Climate Model. While these do not refer to the same thing, General Circulation Models are typically the tools used for modelling climate, and hence the two terms are sometimes used interchangeably. However, the term “global climate model” is ambiguous and may refer to an integrated framework that incorporates multiple components including a general circulation model, or may refer to the general class of climate models that use a variety of means to represent the climate mathematically.

Python Libraries

There are 2 main libraries that are being used in Python to work with NetCDF data: xarray and iris. In this course we will use xarray, this library took the pandas concept and extended it to gridded data. Working with this package in Python is similar to the concepts and ideas as the Pandas library for tabular data. The Iris library has a more unique syntax.

The cartopy library is the package used to plot our data, it is designed for geospatial data processing in order to produce maps and other geospatial data analyses.

Sources:

CMIP5 Database: https://esgf-node.llnl.gov/projects/cmip5/
CMIP5 datastructure: https://portal.enes.org/data/enes-model-data/cmip5/datastructure
Tools: https://nsidc.org/data/netcdf/tools.html
Explanation file name: https://scienceofdoom.com/2020/05/16/extracting-rainfall-data-from-cmip5-models/
Terminology: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#terminology

Key Points

  • NetCDF is a format to store gridded data and widely use in climate science.

  • A netCDF file contains dimensions, variables, and attributes.

  • Xarray is a library to work with NetCDF data in Python.

  • CMIP data is used in climate modelling.