DataSource#
- class metacatalog.models.datasource.DataSource(**kwargs)#
Model to represent a datasource of a specific
Entry
. The datasource further specifies anDataSourceType
by setting apath
andargs
.- id#
Unique id of the record. If not specified, the database will assign it.
- Type:
int
- path#
Path to the actual data. Depending on type, this can be a filepath, SQL tablename or URL.
- Type:
str
- encoding#
The encoding of the file or database representation of the actual data. Defaults to
'utf-8'
. Do only change if necessary.- Type:
str
- args#
Optional. If the I/O classes need further arguments, these can be stored as a JSON-serializable str. Will be parsed into a dict and passed to the I/O functions as
**kwargs
.- Type:
str
- type_id#
Foreign key referencing the :
DataSourceType
.- Type:
int
- type#
The referenced
DataSourceType
. Can be used instead of setting``type_id``.- Type:
metacatalog.models.DataSourceType
- data_names#
New in version 0.3.0.
Deprecated since version 0.9.1.
List of column names that will be displayed when exporting the data. The columns are named in the same order as they appear in the list.
- Type:
list
- variable_names#
New in version 0.9.1.
List of variable names that store the data of the datasource of the entry. In tabular data, this is usually the column name(s) of the variable that is referenced by the Entry. In case of a netCDF file, this is the variable name(s) of the variable(s) that is/are referenced by the Entry. More generally, variable_names describes how a datasource would be indexed to retrieve the data of the entry.
- Type:
list[str]
Example
There is a
DataSourceType
ofname='internal'
, which handles I/O operations on tables in the same database. The datasource itself will then store the tablename aspath
. It can be linked toEntry
in a 1:n relationship. This way, the admin has the full control over data-tables, while still using common I/O classes.- create_scale(resolution, extent, support, scale_dimension, dimension_names: Optional[List[str]] = None, commit: bool = False) None #
Create a new scale for the dataset
- property dimension_names: List[str]#
New in version 0.9.1.
Returns a flat list of all dimensions needed to identify a datapoint in the dataset. The order is [temporal, spatial, variable].
- Returns:
dimension_names – List of dimension names
- Return type:
List[str]
- load_args() dict #
Load the stored arguments from the
'args'
column. It was filled by a JSON string and will be converted as dict before. This dict is usually used for I/O operations and passed as keyword arguments. Therefore this is only useful for a DB admin and should not be exposed to the end-user.New in version 0.1.11.
- save_args_from_dict(args_dict: dict, commit: bool = False) None #
Save all given keyword arguments to the database. These are passed to the importer/adder functions as
**kwargs
.- Parameters:
args_dict (dict) – Dictionary of JSON-serializable keyword arguments that will be stored as a JSON string in the database.
Note
All kwargs need to be json encodeable. This function is only useful for a DB admin and should not be exposed to the end-user
See also
- to_dict(deep: bool = False) dict #
To dict
Return the model as a python dictionary.
- Parameters:
deep (bool) – If True, all related objects will be included as dictionary. Defaults to False
- Returns:
obj – The Model as dict
- Return type:
dict
- class metacatalog.models.datasource.DataSourceType(**kwargs)#
Model to represent a type of datasource.
- id#
Unique id of the record. If not specified, the database will assign it.
- Type:
int
- name#
A short (64) name for the Type. Should not contain any whitespaces
- Type:
str
- title#
The full title of this Type.
- Type:
str
- description#
Optional description about this type
- Type:
str
Note
While it is possible to add more records to the table, this is the only Class that needs actual Python functions to handle the database input. Usually, each type of datasource relies on a specific
importer
and readerreader
that can use the information saved in aDataSource
to perform I/O operations.- to_dict(deep: bool = False) dict #
Return the model as a python dictionary.
- Parameters:
deep (bool) – If True, all related objects will be included as dictionary as well and deep will be passed down. Defaults to False
- Returns:
obj – The Model as dict
- Return type:
dict
- class metacatalog.models.datasource.DataType(**kwargs)#
DataType is describing the type of the actual data. The metacatalog documentation includes several default abstract types. Each combination of
DataType
andDataSourceType
can be assigned with custom reader and writer functions.- id#
Unique id of the record. If not specified, the database will assign it.
- Type:
int
- name#
A short (64) name for the DataType. Should not contain any whitespaces.
- Type:
str
- title#
The full title of this DataType.
- Type:
str
- description#
Optional description about this DataType.
- Type:
str
- children_list() List[DataType] #
Returns an dependency tree for the current datatype. If the list is empty, there are no child (inheriting) datatypes for the current datatype. Otherwise, the list contains all child datatypes that are inheriting the current datatype.
- parent_list() List[DataType] #
Returns an inheritance tree for the current datatype. If the list is empty, the current datatype is a top-level datatype. Otherwise, the list contains all parent datatypes that the current one inherits from.
- to_dict(deep: bool = False) dict #
Return the model as a python dictionary.
- Parameters:
deep (bool) – If True, all related objects will be included as dictionary as well and deep will be passed down. Defaults to False
- Returns:
obj – The Model as dict
- Return type:
dict
- class metacatalog.models.datasource.SpatialScale(**kwargs)#
The SpatialScale is used to commonly describe the spatial scale at which the data described is valid. metacatalog uses the scale triplet (spacing, extent, support), but renames
'spacing'
to'resolution'
.- id#
Unique id of the record. If not specified, the database will assign it.
- Type:
int
- resolution#
Spatial resoultion in meter. The resolution usually describes a grid cell size, which only applies to gridded datasets. Use the
resolution_str
property for a string representation- Type:
int
- extent#
The spatial extent of the dataset is given as a
'POLYGON'
. .. versionchanged:: 0.6.1 From thisPOLYGON
, a bounding box and the centroid are internally calculated. To specify a point location here, use the same value for easting and westing and the same value for northing and southing.- Type:
geoalchemy2.Geometry
- support#
The support gives the spatial validity for a single observation. It specifies the spatial extent at which an observed value is valid. It is given as a fraction of resolution. For gridded datasets, it is common to set support to 1, as the observations are validated to represent the whole grid cell. In case ground truthing data is available, the actual footprint fraction of observations can be given here. Defaults to
support=1.0
.- Type:
float
- dimension_names#
versionadded:: 0.9.1
Names of the spatial dimension in x, y and optionally z-direction. Put the names in a list in the order x, y(, z). In case of tabular data, this is usually the column name of the column that stores the spatial information of the dataset. In case of a netCDF file, this is the dimension name of the dimension that stores the spatial information of the dataset. More generally, dimension_names describes how a datasource would be indexed to retrieve the spatial axis of the entry in x-direction (e.g. [‘x’, ‘y’, ‘z’], [‘lon’, ‘lat’], [‘longitude’, ‘latitude’]).
- Type:
List[str]
- to_dict(deep: bool = False) dict #
Return the model as a python dictionary.
- Parameters:
deep (bool) – If True, all related objects will be included as dictionary. Defaults to False
- Returns:
obj – The Model as dict
- Return type:
dict
- class metacatalog.models.datasource.TemporalScale(*args, **kwargs)#
The TemporalScale is used to commonly describe the temporal scale at which the data described is valid. metacatalog uses the scale triplet (spacing, extent, support), but renames
'spacing'
to'resolution'
.- id#
Unique id of the record. If not specified, the database will assign it.
- Type:
int
- resolution#
Temporal resolution. The resolution has to be given as an ISO 8601 Duration, or a fraction of it. You can substitute standalone minutes can be identified by non-ISO
'min'
.resolution = '15min'
defines a temporal resolution of 15 Minutes. The ISO 8601 is built like:
'P[n]Y[n]M[n]DT[n]H[n]M[n]S'
- Type:
str
- observation_start#
Point in time, when the first observation was made. Forms the temporal extent toghether with observation_end.
- Type:
datetime.datetime
- observation_end#
Point in time, when the last available observation was made. Forms the temporal extent toghether with observation_start.
- Type:
datetime.datetime
- support#
The support gives the temporal validity for a single observation. It specifies the time before an observation, that is still represented by the observation. It is given as a fraction of resolution. I.e. if
support=0.5
atresolution='10min'
, the observation supports5min
(5min before the timestamp) and the resulting dataset would not be exhaustive. Defaults tosupport=1.0
, which would make a temporal exhaustive dataset, but may not apply to each dataset.- Type:
float
- dimension_names#
versionadded:: 0.9.1
Name of the temporal dimension. In case of tabular data, this is usually the column name of the column that stores the temporal information of the dataset. In case of a netCDF file, this is the dimension name of the dimension that stores the temporal information of the dataset. More generally, dimension_names describes how a datasource would be indexed to retrieve the temporal axis of the entry (e.g. ‘time’, ‘date’, ‘datetime’).
- Type:
List[str]
- to_dict(deep: bool = False) dict #
Return the model as a python dictionary.
- Parameters:
deep (bool) – If True, all related objects will be included as dictionary. Defaults to False
- Returns:
obj – The Model as dict
- Return type:
dict