Data object module

The Data is the backbone of the library that allows one to compute easily the different quantities needed to postprocess forecasts.

Basically, it is a Numpy ndarray with the first 5 dimensions allocated to a dedicated and fixed meaning. These first axis of the data represent:

  • Axis 0th: predictor number (\(p\))

  • Axis 1st: observation/realization number (\(n\))

  • Axis 2nd: ensemble member number (\(m\))

  • Axis 3rd: variable or label number (\(v\)) [Not used/implemented for the moment!]

  • Axis 4th: lead time (\(t\))

These 5 first dimensions are called the data index (see Data.index_shape). As such, they represent the data as a multi-dimensional array \(\mathcal{D}_{p,n,m,v} (t)\) where \(t\) is the lead time.

The extra dimensions possibly trailing in the array are the intrinsic dimensions of the data itself. For instance, an array of total dimension 7 represents 2D data (e.g. fields). If only 5 dimensions are present on total, then the data is a scalar. The main operations of the Data are broadcasted over these extra-dimension, making the data object directly compliant with multi-dimensional forecast data.

Examples

Here is an example showing how the Data object works:

>>> import numpy as np
>>> from core.data import Data
>>> a = np.random.randn(2, 3, 10, 1, 60, 20, 20)
>>> data = Data(a)
>>> data.number_of_predictors
2
>>> data.number_of_observations
3
>>> data.number_of_members
10
>>> data.number_of_variables
1
>>> data.number_of_time_steps
60
>>> data.shape
(20, 20)
>>> data.index_shape
(2, 3, 10, 1, 60)

Notes

  • The methods of the Data object return as much as possible another Data object. If it is not possible to format the ouptut according to the shape described above, a ndarray is returned. For example, matrix derived from the data are returned as NumPy arrays.

  • By convention, if a method or operation reduces or returns a Data with one of the index missing, the corresponding index of the object is set to zero to preserve index shape of the object. For example, for a Data object \(\mathcal{D}_{p,n,m,v} (t)\) of index_shape (P, N, M, V, T), the Data.ensemble_max method returns \(\max_m \mathcal{D}_{p,n,m,v} (t)\) as a Data object of shape (P, N, 1, V, T). E.g.:

    >>> import numpy as np
    >>> from core.data import Data
    >>> a = np.random.randn(2, 3, 10, 1, 60, 20, 20)
    >>> data = Data(a)
    >>> data.index_shape
    (2, 3, 10, 1, 60)
    >>> maxi = data.ensemble_max
    >>> maxi.index_shape
    (2, 3, 1, 1, 60)
    

    In the following, in such a case we will use the notation \(\mathcal{D}_{p,n,v} (t) \equiv \mathcal{D}_{p,n,0,v} (t)\).

  • Missing values in Data objects can be marked as numpy.nan. The various averages, summation and methods will automatically ignore the missing values. As a consequence, it means that these averages and summations will include less terms. For example, if at one lead time, an ensemble member value is missing, the ensemble mean is done on the rest of the ensemble at this precise lead time and obviously does not include this member.

References

DATA-GR07

Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007. URL: https://doi.org/10.1198/016214506000001437.

DATA-GRWIG05

Tilmann Gneiting, Adrian E Raftery, Anton H Westveld III, and Tom Goldman. Calibrated probabilistic forecasting using ensemble model output statistics and minimum crps estimation. Monthly Weather Review, 133(5):1098–1118, 2005. URL: https://doi.org/10.1175/MWR2904.1.

DATA-Her00(1,2,3,4)

Hans Hersbach. Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather and Forecasting, 15(5):559–570, 2000. URL: https://doi.org/10.1175/1520-0434(2000)015%3C0559:DOTCRP%3E2.0.CO;2.

DATA-VSV15(1,2)

Bert Van Schaeybroeck and Stéphane Vannitsem. Ensemble post-processing using member-by-member approaches: theoretical aspects. Quarterly Journal of the Royal Meteorological Society, 141(688):807–818, 2015. URL: https://doi.org/10.1002/qj.2397.

Warning

Several properties and definitions inside the Data object are not yet fixed or well-defined. Usages and standards might still evolve.

class core.data.Data(data=None, metadata=None, timestamps=None, dtype=<class 'numpy.float64'>)[source]

Bases: object

Main data structure of the library.

Parameters
  • data (None or ndarray) –

    The data array. If not None, should be an array of shape

    Default to None.

  • timestamps (None or ndarray(datetime) or list(ndarray(datetime))) – The timestamps of the forecast data. Can be a 1D ndarray of datetime timestamps (one per lead time). In that case, the same timestamps vector is attributed to all the predictors and the observations provided by data. Can also be a list of 1D ndarray of datetime timestamps (one list entry per observation). It allows one to set a different timestamps per observation/realization. If None, no timestamp is set. Default to None.

  • metadata (object or ndarray(object) or list(ndarray(object))) – Object(s) describing the metadata of the data (not implemented yet). Can be an object, a ndarray of objects (one per observation/realization), or a list of 1D ndarray of objects (one list entry per predictor, one array component per observation/realization). If a single array is provided, it can be of shape (number_of_predictors, number_of_observations) to specify the metadata of each predictor and observation/realization separately. It can also be a 1D array for which each component corresponds to an observation/realization. In this case, the same metadata object is used for each predictor. Default to the None object.

  • dtype (dtype) – The data type of the data being stored. Default to numpy.float64.

data

The data array.

Type

ndarray

timestamps

The timestamps of the data, stored as ndarray of datetime and with shape corresponding to (number_of_predictors, number_of_observations).

Type

(ndarray(ndarray(datetime))

metadata

Object describing the metadata of the data (not specified yet).

Type

ndarray(object)

Abs_CRPS(other)[source]

Return the Absolute norm CRPS scores with another Data object \(\mathcal{O}\) (typically containing observations). This score is computed with the analytical formula:

\({\rm CRPS}^{\rm Abs}_{p,v} (t) = \left\langle d^{\rm ens}_{p,n,v} [\mathcal{O}] (t) - \delta_{p,n,v} (t) /2 \right\rangle_n\)

where \(d^{\rm{ens}}_{p,n,v} [\mathcal{O}]\) is the ensemble_distance() with the observations object \(\mathcal{O}\), and \(\delta_{p,n,v}\) is delta, obtained by taking the average of the ensemble_members_distance over the ensemble members. See [DATA-VSV15] and [DATA-GR07] for more details.

Parameters

other (Data) – Another Data object with observations in it.

Returns

The Absolute norm CRPS score.

Return type

Data

CRPS(other)[source]

Return the CRPS scores with another Data object \(\mathcal{O}\) (typically containing observations). This score is computed according to [DATA-Her00] (see pp. 563-564).

Parameters

other (Data) – Another Data object with observations in it.

Returns

The CRPS score.

Return type

Data

CRPS_decomposition(other)[source]

Return the decomposition of CRPS scores with another Data object \(\mathcal{O}\) (typically containing observations) according to the fomula:

\({\rm CRPS}_{p,v} (t) = {\rm Reli}_{p,v} (t) - {\rm Resol}_{p,v} (t) + {\rm Unc}_{p,v} (t)\)

where \({\rm Reli}_{p,v} (t)\), \({\rm Resol}_{p,v} (t)\) and \({\rm Unc}_{p,v} (t)\) are respectively the reliability, the resolution and the uncertainty contribution to the CRPS. See [DATA-Her00], pp. 565 for more details.

Parameters

other (Data) – Another Data object with observations in it.

Returns

The decomposition of the CRPS score into the reliability, the resolution and the uncertainty contribution.

Return type

tuple(Data)

CRPS_relipot(other)[source]

Return the decomposition of CRPS scores with another Data object \(\mathcal{O}\) (typically containing observations) according to the fomula:

\({\rm CRPS}_{p,v} (t) = {\rm Reli}_{p,v} (t) + {\rm CRPS}^{\rm pot}_{p,v} (t)\)

where \({\rm Reli}_{p,v} (t)\) and \({\rm CRPS}^{\rm pot}_{p,v} (t)\) are respectively the reliability and potential CRPS, i.e. the CRPS one would obtain with a perfectly reliable ensemble. See [DATA-Her00], pp. 564 for more details.

Parameters

other (Data) – Another Data object with observations in it.

Returns

The decomposition of the CRPS score into the reliability and the potential CRPS.

Return type

tuple(Data)

Ngr_CRPS(other)[source]

Return the Non-homogeneous Gaussian Regression (NGR) CRPS scores with another Data object \(\mathcal{O}\) (typically containing observations). This score is computed with the analytical formula:

\({\rm CRPS}^{\rm Ngr}_{p,v} (t) = \left\langle\sigma^{\rm ens}_{p,n,v} (t) \left(z_{p,n,v}(t)(2\Phi(z_{p,n,v}(t)) -1) + 2\phi(z_{p,n,v}(t)) - \pi^{-1/2}\right)\right\rangle_n\)

where \(\phi\) is the normal distribution, \(\Phi\) is its cumulative distribution function and \(z_{p,n,v} = \left(\mathcal{O}_{p,n,v} (t) -\mu^{\rm{ens}}_{p,n,v} (t)\right)/\sigma^{\rm{ens}}_{p,n,v} (t)\) is the standardized error with respect to the other data (where \(\mu^{\rm{ens}}\) and \(\sigma^{\rm{ens}}\) are respectively the ensemble_mean and the ensemble_std). See [DATA-VSV15] and [DATA-GRWIG05] for more details.

Parameters

other (Data) – Another Data object with observations in it. Must have the same shape as the Data object, except along the members \(m\) dimensions (2nd axis).

Returns

The NGR CRPS score.

Return type

Data

append_members(data)[source]

Append a members Data object to the current ones (i.e. along the 2nd axis).

Parameters

data (Data) – The data object of the members to append. Must be compatible/broadcastable. If the initial Data object is empty, simply copy the data object.

append_observations(data)[source]

Append a observations Data object to the current ones (i.e. along the 1st axis). Alias for append_realization().

Parameters

data (Data) – The data object of the observations to append. Must be compatible/broadcastable. If the initial Data object is empty, simply copy the data object.

append_predictors(data)[source]

Append a predictors Data object to the current ones (i.e. along the 0th axis).

Parameters

data (Data) – The data object of the predictors to append. Must be compatible/broadcastable. If the initial Data object is empty, simply copy the data object.

append_realizations(data)[source]

Append a realizations Data object to the current ones (i.e. along the 1st axis).

Parameters

data (Data) – The data object of the realizations to append. Must be compatible/broadcastable. If the initial Data object is empty, simply copy the data object.

bias(other)[source]

Return the bias \(\left\langle\mu^{\rm{ens}}_{p,n,v} (t) - \mathcal{O}_{p,n,v} (t)\right\rangle_n\) with another Data object \(\mathcal{O}\) (typically containing observations).

Parameters

other (Data) – Another Data object with observations in it. Must have the same shape as the Data object, except along the members \(m\) dimensions (2nd axis).

Returns

The bias.

Return type

Data

property centered_ensemble

Returns an ensemble centered on its ensemble_mean: \(\bar{\mathcal{D}}^{\rm ens}_{p,n,m,v} (t) = \mathcal{D}_{p,n,m,v} (t) - \mu^{\rm ens}_{p,n,v} (t)\).

Type

Data

property centered_observation

Returns an ensemble centered on its observational_mean: \(\bar{\mathcal{D}}^{\rm obs}_{p,n,m,v} (t) = \mathcal{D}_{p,n,m,v} (t) - \mu^{\rm obs}_{p,n,v} (t)\).

Type

Data

clear_data()[source]

Reset the Data object.

copy()[source]

Return a (shallow) copy of the Data object.

Returns

A copy of the Data object.

Return type

Data

property delta

Average over the ensemble members of the ensemble_members_distance: \(\delta_{p,n,v} (t) = \left\langle d^{\rm{MBM}}_{p,n,m_1,m_2,v} (t) \right\rangle_{m_1, m_2}\)

Type

Data

property dtype

The data type.

Type

dtype

ensemble_distance(other)[source]

Data: Averaged distance between ensemble member and another Data object: \(d^{\rm{ens}}_{p,n,v} [\mathcal{O}] (t) = \langle|\mathcal{D}_{p,n,m,v} (t)- \mathcal{O}_{p,n,m,v} (t)|\rangle_m\) where \(\mathcal{O}\) is the other Data object.

property ensemble_max

Ensemble maximum over the ensemble index \(m\): \(\max_m \mathcal{D}_{p,n,m,v} (t)\).

Type

Data

property ensemble_mean

Ensemble mean. Mean over the ensemble index \(m\): \(\mu^{\rm{ens}}_{p,n,v} (t) = \langle \mathcal{D}_{p,n,m,v} (t) \rangle_m\).

Type

Data

ensemble_mean_MSE(other)[source]

Return the Mean Square Error of the ensemble mean \(\left\langle\left(\mu^{\rm{ens}}_{p,n,v} (t) - \mathcal{O}_{p,n,v} (t)\right)^2\right\rangle_n\) with another Data object \(\mathcal{O}\) (typically containing observations).

Parameters

other (Data) – Another Data object with observations in it. Must have the same shape as the Data object, except along the members \(m\) dimensions (2nd axis).

Returns

The ensemble mean Mean Square Error.

Return type

Data

ensemble_mean_RMSE(other)[source]

Return the Root Mean Square Error of the ensemble mean \(\sqrt{\left\langle\left(\mu^{\rm{ens}}_{p,n,v} (t) - \mathcal{O}_{p,n,v} (t)\right)^2\right\rangle_n}\) with another Data object \(\mathcal{O}\) (typically containing observations).

Parameters

other (Data) – Another Data object with observations in it. Must have the same shape as the Data object, except along the members \(m\) dimensions (2nd axis).

Returns

The ensemble mean Root Mean Square Error.

Return type

Data

ensemble_mean_observational_covariance(other)[source]

Observational covariance matrix of the ensemble mean with another Data object \(\mathcal{O}\): \({\rm Cov}^{\rm obs}_{p_1, p_2, v} [\bar{\mathcal{O}}^{\rm obs}, \mu^{\rm ens}] (t)= \left\langle \left\langle\bar{\mathcal{O}}^{\rm obs}_{p_1,n,m,v} (t)\right\rangle_m \, \, \bar{\mu}^{\rm ens}_{p_2,n,v} (t) \right\rangle_n\)

where \(\bar{\mu}^{\rm ens}_{p,n,v} (t) = \mu^{\rm ens}_{p,n,v}(t) - \langle \mu^{\rm ens}_{p,n',v}(t) \rangle_{n'}\) and where \(\mu^{\rm ens}_{p,n,v}(t)\) is the ensemble_mean. \(\bar{\mathcal{O}}^{\rm obs}_{p,n,m,v} (t)\) is the centered_observation of the other Data object.

Parameters

other (Data) – Another Data object with observations in it.

Returns

The variance vector.

Return type

Data

property ensemble_mean_observational_self_covariance

Ensemble mean observational covariance matrix: \({\rm Cov}^{\rm obs}_{p_1, p_2, v} [\mu^{\rm ens}, \mu^{\rm ens}] (t)= \left\langle \bar{\mu}^{\rm ens}_{p_1,n,v} (t) \, \bar{\mu}^{\rm ens}_{p_2,n,v} (t) \right\rangle_n\)

where \(\bar{\mu}^{\rm ens}_{p,n,v} (t) = \mu^{\rm ens}_{p,n,v}(t) - \langle \mu^{\rm ens}_{p,n',v}(t) \rangle_{n'}\) and where \(\mu^{\rm ens}_{p,n,v}(t)\) is the ensemble_mean.

Type

numpy.ndarray

property ensemble_median

Ensemble median. Median over the ensemble index \(m\).

Type

Data

property ensemble_members_distance

Distance between ensemble members: \(d^{\rm{MBM}}_{p,n,m_1,m_2,v} (t) = |\mathcal{D}_{p,n,m_1,v} (t)- \mathcal{D}_{p,n,m_2,v} (t) |\)

Type

ndarray

property ensemble_min

Ensemble minimum over the ensemble index \(m\): \(\min_m \mathcal{D}_{p,n,m,v} (t)\).

Type

Data

ensemble_quantiles(q, interpolation='linear')[source]

Return the ensemble quantiles of the data.

Parameters
  • q (array_like(float)) – Quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive.

  • interpolation (str, optional) – This optional parameter specifies the interpolation method to use when the desired quantile lies between two data points. See numpy.quantile() for more information.

Returns

The ensemble quantiles, stored along the ensemble member number axis (1st axis).

Return type

Data

property ensemble_std

Ensemble standard deviation over the ensemble index \(m\): \(\sigma^{\rm{ens}}_{p,n,v} (t)\).

Type

Data

property ensemble_var

Ensemble variance over the ensemble index \(m\): \(\sigma^{\rm{ens}}_{p,n,v} (t)^2 = \left\langle \left( \mathcal{D}_{p,n,m,v} (t) - \mu^{\rm{ens}}_{p,n,v} (t) \right)^2 \right\rangle_m\).

Type

Data

full_like(value, **kwargs)[source]

Like numpy.full_like(), returns a full Data object with the same index_shape and shape and type as the initial one.

Parameters
Returns

The full Data object

Return type

Data

get_data()[source]

Return the whole data array.

Returns

The whole data array.

Return type

ndarray

get_metadata()[source]

Return the meta data.

get_value(index)[source]

Get the value(s) of a particular data index.

Parameters

index (tuple(int)) – The data index of the value.

Returns

values – The values corresponding to the index.

Return type

ndarray

property index_shape

The shape of the data index.

Type

tuple(int)

is_empty()[source]

bool: Return true if there is no data stored.

is_field()[source]

bool: Return true if the data stored are fields.

is_scalar()[source]

bool: Return true if the data stored are scalars.

is_vector()[source]

bool: Return true if the data stored are vectors.

load_from_file(filename, **kwargs)[source]

Function to load previously saved data with the method save_to_file().

Parameters
  • filename (str) – The file name where the Data object was saved.

  • kwargs (dict) – Keyword arguments to pass to the pickle module method.

load_scalars(data, metadata=None, timestamps=None, load_axis=1, concat_axis=1, columns=0, replace_timestamps=False)[source]

Load scalar data in the Data object. For the moment, only Pandas dataframe and NumPy arrays are accepted.

Parameters
  • data (DataFrame or ndarray or list(DataFrame) or list(ndarray)) – The data to load in the object, packed along the load_axis. If ~numpy.ndarray are provided, they can be at most 2-dimensional and their last axis is always identified with the lead time. The remaining axis will be identified to an axis of the Data object given by load_axis If ~pandas.DataFrame are provided, there row axis is expected to be identified with the lead time, while the columns axis will be identified to an axis of the Data object given by load_axis In both cases, a list of data can be provided instead, the list items will be loaded along the axis provided by the first element of load_axis, which must thus be a 2-tuple. If the list elements are 2D, their first axis are loaded along the axis of the Data object corresponding to the second element of load_axis, the last axis being loaded along the lead time axis. If the list elements are 1D, they are loaded along the lead time axis. Finally, if there are already data inside the object, the data provided will be appended to them along the concat_axis.

  • metadata (object or list(object)) – The metadata of the provided data. If a list of data is provided, then a list of metadata object can be provided. Otherwise, the same metadata object will be used for all the data items in the list.

  • timestamps (ndarray(datetime) or list(ndarray(datetime))) – The timestamps array(s) of the provided data, as ~datetime.datetime object. If a list of data is provided, then a list of timestamps arrays can be provided. Otherwise, the same timestamps array will be used for all the data items in the list.

  • load_axis (int or str or tuple(int) or tuple(str)) – Axis over which the provided data are loaded in the Data object. Equal to 1 by default to match the observation index. Can be a number if data is a ~numpy.ndarray or a ~pandas.DataFrame. Have to be 2-tuple if data is a list (see above). Can also be a string like i.e. ‘obs’ to load along the observation index, or ‘members’ to load along the ensemble member index.

  • concat_axis (int or str) – Axis over which the data have to be concatenated. Can be a number or a string like i.e. ‘obs’ to concatenate along the observation index, or ‘members’ to concatenate along the ensemble member index.

  • columns (int or str or list(int) or list(str), optional) – Allow to specify the column of the ~pandas.DataFrame to load along load_axis. Only works with pandas ~pandas.DataFrame.

  • replace_timestamps (bool) – Replace the timestamps possibly already present in the Data object if concat_axis is not 1. Default to False.

load_timestamps(timestamps)[source]

Load timestamps data.

Parameters

timestamps (ndarray(datetime) or list(ndarray(datetime))) – The timestamps of the forecast data. Can be a 1D ndarray of datetime timestamps (one per lead time). In that case, the same timestamps vector is attributed to all the predictors and the observations provided by data. Can also be a list of 1D ndarray of datetime timestamps (one list entry per observation). It allows one to set a different timestamps per observation/realization.

property ndim

The dimension of the data.

Type

int

property number_of_members

The number of ensemble members stored in the data object.

Type

int

property number_of_observations

The number of observations stored in the data object.

Type

int

property number_of_predictors

The number of predictors stored in the data object.

Type

int

property number_of_time_steps

The number of time steps stored in the data object.

Type

int

property number_of_variables

The number of variables stored in the data object.

Type

int

property observational_distance

Distance between observations: \(d^{\rm{obs}}_{p,n_1,n_2,m,v} (t) = |\mathcal{D}_{p,n_1,m,v} (t)- \mathcal{D}_{p,n_2,m,v} (t)|\)

Type

ndarray

property observational_max

Observational maximum over the ensemble index \(n\): \(\max_n \mathcal{D}_{p,n,m,v} (t)\).

Type

Data

property observational_mean

Mean over the observation index \(n\): \(\mu^{\rm obs}_{p,m,v} (t) = \langle \mathcal{D}_{p,n,m,v} (t) \rangle_n\).

Type

Data

property observational_median

Median over the observation index \(n\).

Type

Data

property observational_min

Observational minimum over the ensemble index \(n\): \(\min_n \mathcal{D}_{p,n,m,v} (t)\).

Type

Data

observational_quantiles(q, interpolation='linear')[source]

Return the observational quantiles of the data.

Parameters
  • q (array_like(float)) – Quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive.

  • interpolation (str, optional) –

    This optional parameter specifies the interpolation method to use when the desired quantile lies between

    two data points. See numpy.quantile() for more information.

Returns

The observational quantiles, stored along the observation axis (1st axis).

Return type

Data

property observational_std

Standard deviation over the observation index \(n\): \(\sigma^{\rm obs}_{p,m,v} (t)\).

Type

Data

property observational_var

Variance over the observation index \(n\): \(\sigma^{\rm obs}_{p,m,v} (t)^2 = \left\langle (\mathcal{D}_{p,n,m,v} (t) - \mu^{\rm obs}_{p,m,v} (t))^2 \right\rangle_n\).

Type

Data

plot(predictor=0, variable=0, ax=None, timestamps=None, global_label=None, grid_point=None, **kwargs)[source]

Plot the data as a function of time.

Parameters
  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • global_label (None or str or list(str), optional) – Label to represent all the data (str), or all the data of one observation (list of str) in the legend.

  • grid_point (tuple(int, int), optional) – If the data are fields, specifies which grid point to plot.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the data were plotted.

Return type

Axes

plot_Abs_CRPS(other, predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, **kwargs)[source]

Plot the data Absolute norm CRPS Abs_CRPS() score with respect to observation data (other) a function of time.

Parameters
  • other (Data) – Another data structure holding the observations.

  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the CRPS data were plotted.

Return type

Axes

plot_CRPS(other, predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, **kwargs)[source]

Plot the data CRPS CRPS() score with respect to observation data (other) a function of time.

Parameters
  • other (Data) – Another data structure holding the observations.

  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the CRPS data were plotted.

Return type

Axes

plot_Ngr_CRPS(other, predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, **kwargs)[source]

Plot the data Non-homogeneous Gaussian Regression (NGR) CRPS Ngr_CRPS() score with respect to observation data (other) a function of time.

Parameters
  • other (Data) – Another data structure holding the observations.

  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • ax (Axes, optional) – An axes on which to plot.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the CRPS data were plotted.

Return type

Axes

plot_ensemble_mean(predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, **kwargs)[source]

Plot the data ensemble mean ensemble_mean as a function of time.

Parameters
  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the data were plotted.

Return type

Axes

plot_ensemble_median(predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, **kwargs)[source]

Plot the data ensemble median ensemble_median as a function of time.

Parameters
  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the data were plotted.

Return type

Axes

plot_ensemble_minmax(predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, **kwargs)[source]

Plot the data ensemble minimum ensemble_min and maximum ensemble_max as a function of time.

Parameters
  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the data were plotted.

Return type

Axes, optional

plot_ensemble_quantiles(q, low_interpolation='linear', high_interpolation='linear', predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, alpha=0.1, **kwargs)[source]

Plot the data ensemble quantiles ensemble_quantiles as a function of time.

Parameters
  • q (array_like(float)) – Quantile or sequence of quantiles to compute, which must be between 0 and 0.5 exclusive. A symmetric quantile with respect to 0.5 will also be computed.

  • low_interpolation (str, optional) – This optional parameter specifies the interpolation method to use when the desired lower quantile (q<0.5) lies between two data points. See numpy.quantile() for more information.

  • high_interpolation (str, optional) – This optional parameter specifies the interpolation method to use when the desired higher quantile (q>0.5) lies between two data points. See numpy.quantile() for more information.

  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • alpha (float) – Base level of transparency for the highest and lowest quantiles.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the data were plotted.

Return type

Axes

plot_ensemble_std(predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, **kwargs)[source]

Plot the data ensemble standard deviation ensemble_std as a function of time.

Parameters
  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the data were plotted.

Return type

Axes

plot_observational_mean(predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, **kwargs)[source]

Plot the data observational mean observational_mean as a function of time.

Parameters
  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the data were plotted.

Return type

Axes

plot_observational_median(predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, **kwargs)[source]

Plot the data observational median observational_median as a function of time.

Parameters
  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the data were plotted.

Return type

Axes

plot_observational_minmax(predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, **kwargs)[source]

Plot the data observational minimum observational_min and maximum observational_max as a function of time.

Parameters
  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the data were plotted.

Return type

Axes, optional

plot_observational_quantiles(q, low_interpolation='linear', high_interpolation='linear', predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, alpha=0.1, **kwargs)[source]

Plot the data observational quantiles observational_quantiles as a function of time.

Parameters
  • q (array_like(float)) – Quantile or sequence of quantiles to compute, which must be between 0 and 0.5 exclusive. A symmetric quantile with respect to 0.5 will also be computed.

  • low_interpolation (str, optional) – This optional parameter specifies the interpolation method to use when the desired lower quantile (q<0.5) lies between two data points. See numpy.quantile() for more information.

  • high_interpolation (str, optional) – This optional parameter specifies the interpolation method to use when the desired higher quantile (q>0.5) lies between two data points. See numpy.quantile() for more information.

  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • alpha (float) – Base level of transparency for the highest and lowest quantiles.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the data were plotted.

Return type

Axes

plot_observational_std(predictor=0, variable=0, ax=None, timestamps=None, grid_point=None, **kwargs)[source]

Plot the data observational standard deviation observational_std as a function of time.

Parameters
  • predictor (int, optional) – The predictor index to use. Default is 0.

  • variable (int, optional) – The variable index to use.

  • ax (Axes, optional) – An axes on which to plot.

  • timestamps (None or ndarray(datetime), optional) – An array containing the timestamp of the data. If None, try to use the data timestamps and in last resort a numbered time index. Default to None.

  • grid_point (tuple(int, int)) – If the data are fields, specifies which grid point to plot.

  • kwargs (dict) – Argument to be passed to the plotting routine.

Returns

ax – An axes where the data were plotted.

Return type

Axes

save_to_file(filename, **kwargs)[source]

Function to save the data to a file with the pickle module.

Parameters
  • filename (str) – The file name where to save the Data object.

  • kwargs (dict) – Keyword arguments to pass to the pickle module method.

set_dtype(dtype)[source]

Set the data type.

Parameters

dtype (dtype) – The Numpy data type of the data.

property shape

The shape of the data.

Type

tuple(int)

property uncertainty

Average over the observations of the observational_distance divided by 2: \(\langle d^{\rm{obs}}_{p,n_1,n_2,m,v} (t)\rangle_{n_1,n_2} / 2\). Sometimes called the uncertainty contribution of the CRPS. See [DATA-Her00] for more details.

Type

Data

zeros_like(**kwargs)[source]

Like numpy.zeros_like(), returns a Data object with the same index_shape and shape and type as the initial one, but filled with zeros.

Parameters

kwargs (dict) – The argument to pass to numpy.zeros_like().

Returns

The zeros Data object

Return type

Data