DataSource#

class metacatalog.models.datasource.DataSource(**kwargs)#

Model to represent a datasource of a specific Entry. The datasource further specifies an DataSourceType by setting a path and args.

id#

Unique id of the record. If not specified, the database will assign it.

Type:

int

path#

Path to the actual data. Depending on type, this can be a filepath, SQL tablename or URL.

Type:

str

encoding#

The encoding of the file or database representation of the actual data. Defaults to 'utf-8'. Do only change if necessary.

Type:

str

args#

Optional. If the I/O classes need further arguments, these can be stored as a JSON-serializable str. Will be parsed into a dict and passed to the I/O functions as **kwargs.

Type:

str

type_id#

Foreign key referencing the :DataSourceType.

Type:

int

type#

The referenced DataSourceType. Can be used instead of setting``type_id``.

Type:

metacatalog.models.DataSourceType

data_names#

New in version 0.3.0.

Deprecated since version 0.9.1.

List of column names that will be displayed when exporting the data. The columns are named in the same order as they appear in the list.

Type:

list

variable_names#

New in version 0.9.1.

List of variable names that store the data of the datasource of the entry. In tabular data, this is usually the column name(s) of the variable that is referenced by the Entry. In case of a netCDF file, this is the variable name(s) of the variable(s) that is/are referenced by the Entry. More generally, variable_names describes how a datasource would be indexed to retrieve the data of the entry.

Type:

list[str]

Example

There is a DataSourceType of name='internal', which handles I/O operations on tables in the same database. The datasource itself will then store the tablename as path. It can be linked to Entry in a 1:n relationship. This way, the admin has the full control over data-tables, while still using common I/O classes.

create_scale(resolution, extent, support, scale_dimension, dimension_names: Optional[List[str]] = None, commit: bool = False) None#

Create a new scale for the dataset

property dimension_names: List[str]#

New in version 0.9.1.

Returns a flat list of all dimensions needed to identify a datapoint in the dataset. The order is [temporal, spatial, variable].

Returns:

dimension_names – List of dimension names

Return type:

List[str]

load_args() dict#

Load the stored arguments from the 'args' column. It was filled by a JSON string and will be converted as dict before. This dict is usually used for I/O operations and passed as keyword arguments. Therefore this is only useful for a DB admin and should not be exposed to the end-user.

New in version 0.1.11.

save_args_from_dict(args_dict: dict, commit: bool = False) None#

Save all given keyword arguments to the database. These are passed to the importer/adder functions as **kwargs.

Parameters:

args_dict (dict) – Dictionary of JSON-serializable keyword arguments that will be stored as a JSON string in the database.

Note

All kwargs need to be json encodeable. This function is only useful for a DB admin and should not be exposed to the end-user

See also

load_args

to_dict(deep: bool = False) dict#

To dict

Return the model as a python dictionary.

Parameters:

deep (bool) – If True, all related objects will be included as dictionary. Defaults to False

Returns:

obj – The Model as dict

Return type:

dict

class metacatalog.models.datasource.DataSourceType(**kwargs)#

Model to represent a type of datasource.

id#

Unique id of the record. If not specified, the database will assign it.

Type:

int

name#

A short (64) name for the Type. Should not contain any whitespaces

Type:

str

title#

The full title of this Type.

Type:

str

description#

Optional description about this type

Type:

str

Note

While it is possible to add more records to the table, this is the only Class that needs actual Python functions to handle the database input. Usually, each type of datasource relies on a specific importer and reader reader that can use the information saved in a DataSource to perform I/O operations.

to_dict(deep: bool = False) dict#

Return the model as a python dictionary.

Parameters:

deep (bool) – If True, all related objects will be included as dictionary as well and deep will be passed down. Defaults to False

Returns:

obj – The Model as dict

Return type:

dict

class metacatalog.models.datasource.DataType(**kwargs)#

DataType is describing the type of the actual data. The metacatalog documentation includes several default abstract types. Each combination of DataType and DataSourceType can be assigned with custom reader and writer functions.

id#

Unique id of the record. If not specified, the database will assign it.

Type:

int

name#

A short (64) name for the DataType. Should not contain any whitespaces.

Type:

str

title#

The full title of this DataType.

Type:

str

description#

Optional description about this DataType.

Type:

str

children_list() List[DataType]#

Returns an dependency tree for the current datatype. If the list is empty, there are no child (inheriting) datatypes for the current datatype. Otherwise, the list contains all child datatypes that are inheriting the current datatype.

parent_list() List[DataType]#

Returns an inheritance tree for the current datatype. If the list is empty, the current datatype is a top-level datatype. Otherwise, the list contains all parent datatypes that the current one inherits from.

to_dict(deep: bool = False) dict#

Return the model as a python dictionary.

Parameters:

deep (bool) – If True, all related objects will be included as dictionary as well and deep will be passed down. Defaults to False

Returns:

obj – The Model as dict

Return type:

dict

class metacatalog.models.datasource.SpatialScale(**kwargs)#

The SpatialScale is used to commonly describe the spatial scale at which the data described is valid. metacatalog uses the scale triplet (spacing, extent, support), but renames 'spacing' to 'resolution'.

id#

Unique id of the record. If not specified, the database will assign it.

Type:

int

resolution#

Spatial resoultion in meter. The resolution usually describes a grid cell size, which only applies to gridded datasets. Use the resolution_str property for a string representation

Type:

int

extent#

The spatial extent of the dataset is given as a 'POLYGON'. .. versionchanged:: 0.6.1 From this POLYGON, a bounding box and the centroid are internally calculated. To specify a point location here, use the same value for easting and westing and the same value for northing and southing.

Type:

geoalchemy2.Geometry

support#

The support gives the spatial validity for a single observation. It specifies the spatial extent at which an observed value is valid. It is given as a fraction of resolution. For gridded datasets, it is common to set support to 1, as the observations are validated to represent the whole grid cell. In case ground truthing data is available, the actual footprint fraction of observations can be given here. Defaults to support=1.0.

Type:

float

dimension_names#

versionadded:: 0.9.1

Names of the spatial dimension in x, y and optionally z-direction. Put the names in a list in the order x, y(, z). In case of tabular data, this is usually the column name of the column that stores the spatial information of the dataset. In case of a netCDF file, this is the dimension name of the dimension that stores the spatial information of the dataset. More generally, dimension_names describes how a datasource would be indexed to retrieve the spatial axis of the entry in x-direction (e.g. [‘x’, ‘y’, ‘z’], [‘lon’, ‘lat’], [‘longitude’, ‘latitude’]).

Type:

List[str]

to_dict(deep: bool = False) dict#

Return the model as a python dictionary.

Parameters:

deep (bool) – If True, all related objects will be included as dictionary. Defaults to False

Returns:

obj – The Model as dict

Return type:

dict

class metacatalog.models.datasource.TemporalScale(*args, **kwargs)#

The TemporalScale is used to commonly describe the temporal scale at which the data described is valid. metacatalog uses the scale triplet (spacing, extent, support), but renames 'spacing' to 'resolution'.

id#

Unique id of the record. If not specified, the database will assign it.

Type:

int

resolution#

Temporal resolution. The resolution has to be given as an ISO 8601 Duration, or a fraction of it. You can substitute standalone minutes can be identified by non-ISO 'min'.

resolution = '15min'

defines a temporal resolution of 15 Minutes. The ISO 8601 is built like:

'P[n]Y[n]M[n]DT[n]H[n]M[n]S'
Type:

str

observation_start#

Point in time, when the first observation was made. Forms the temporal extent toghether with observation_end.

Type:

datetime.datetime

observation_end#

Point in time, when the last available observation was made. Forms the temporal extent toghether with observation_start.

Type:

datetime.datetime

support#

The support gives the temporal validity for a single observation. It specifies the time before an observation, that is still represented by the observation. It is given as a fraction of resolution. I.e. if support=0.5 at resolution='10min', the observation supports 5min (5min before the timestamp) and the resulting dataset would not be exhaustive. Defaults to support=1.0, which would make a temporal exhaustive dataset, but may not apply to each dataset.

Type:

float

dimension_names#

versionadded:: 0.9.1

Name of the temporal dimension. In case of tabular data, this is usually the column name of the column that stores the temporal information of the dataset. In case of a netCDF file, this is the dimension name of the dimension that stores the temporal information of the dataset. More generally, dimension_names describes how a datasource would be indexed to retrieve the temporal axis of the entry (e.g. ‘time’, ‘date’, ‘datetime’).

Type:

List[str]

to_dict(deep: bool = False) dict#

Return the model as a python dictionary.

Parameters:

deep (bool) – If True, all related objects will be included as dictionary. Defaults to False

Returns:

obj – The Model as dict

Return type:

dict