NetCDF I/O Handling in Iris#
This document provides a basic account of how Iris loads and saves NetCDF files.
Under Construction
This document is still a work in progress, so might include blank or unfinished sections, watch this space!
Chunk Control#
Default Chunking#
Chunks are, by default, optimised by Iris on load. This will automatically decide the best chunksize for your data without any user input. This is calculated based on a number of factors, including:
File Variable Chunking
Full Variable Shape
Dask Default Chunksize
Dimension Order: Earlier (outer) dimensions will be prioritised to be split over later (inner) dimensions.
>>> cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.shape)
(240, 37, 49)
>>> print(cube.core_data().chunksize)
(60, 37, 49)
For more user control, functionality was updated in PR #5588, with the
creation of the iris.fileformats.netcdf.loader.CHUNK_CONTROL
class.
Custom Chunking: Set#
There are three context manangers within CHUNK_CONTROL
. The most basic is
set()
. This allows you to specify the chunksize for each dimension,
and to specify a var_name
specifically to change.
Using -1
in place of a chunksize will ensure the chunksize stays the same
as the shape, i.e. no optimisation occurs on that dimension.
>>> with CHUNK_CONTROL.set("air_temperature", time=180, latitude=-1, longitude=25):
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(180, 37, 25)
Note that var_name
is optional, and that you don’t need to specify every dimension. If you
specify only one dimension, the rest will be optimised using Iris’ default behaviour.
>>> with CHUNK_CONTROL.set(longitude=25):
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(120, 37, 25)
Custom Chunking: From File#
The second context manager is from_file()
.
This takes chunksizes as defined in the NetCDF file. Any dimensions without specified chunks
will default to Iris optimisation.
>>> with CHUNK_CONTROL.from_file():
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(120, 37, 49)
Custom Chunking: As Dask#
The final context manager, as_dask()
, bypasses
Iris’ optimisation all together, and will take its chunksizes from Dask’s behaviour.
>>> with CHUNK_CONTROL.as_dask():
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(70, 37, 49)
Split Attributes#
TBC
Deferred Saving#
TBC
Guessing Coordinate Axes#
Iris will attempt to add an axis
attribute when saving any coordinate
variable in a NetCDF file. E.g:
float longitude(longitude) ;
longitude:axis = "X" ;
This is achieved by calling iris.util.guess_coord_axis()
on each
coordinate being saved.
Disabling Axis-Guessing#
For some coordinates, guess_coord_axis()
will derive an
axis that is not appropriate. If you have such a coordinate, you can disable
axis-guessing by setting the coordinate’s
ignore_axis
property to True
.
One example (from SciTools/iris#5003) is a coordinate describing pressure thresholds, measured in hecto-pascals. Iris interprets pressure units as indicating a Z-dimension coordinate, since pressure is most commonly used to describe altitude/depth. But a pressure threshold coordinate is instead describing alternate scenarios - not a spatial dimension at all - and it is therefore inappropriate to assign an axis to it.
Worked example:
>>> from iris.coords import DimCoord
>>> from iris.util import guess_coord_axis
>>> my_coord = DimCoord(
... points=[1000, 1010, 1020],
... long_name="pressure_threshold",
... units="hPa",
... )
>>> print(guess_coord_axis(my_coord))
Z
>>> my_coord.ignore_axis = True
>>> print(guess_coord_axis(my_coord))
None