WPS Toolbox#
Overview#
The Tools in V-FOR-WaTer are implemented as a standalone server. It uses the WPS Protocol to receive inputs, data and configuration and returns processing output. Most data, including input data from the database, but also processing results and intermediate results, are saved to a temporary folder on the processing server. The WPS just exchanges configuration and file-locations. For this to work, a set of data types are defined. Each tool can only handle one specific data type, or a set of specified data types, and has defined outputs. These data types are hierachical. That means there are tools that can handle any type of timeseries data, while other will expect a timeseries of discharge data.
Data Types#
type name |
file type |
reader function |
description |
---|---|---|---|
array |
|
1D array, without any index information |
|
iarray |
|
1D array, indexed by any kind of pandas supported index, except |
|
varray |
|
1D array, indexed by any kind of pandas supported index, except |
|
timeseries |
|
1D array, indexed by a |
|
vtimeseries |
|
1D array, indexed by a |
|
2darray |
|
2D array, without any index information |
|
ndarray |
|
N-dimensional array, without any index information |
|
idataframe |
|
multiple arrays, indexed by any kind of pandas supported index, except |
|
vdataframe |
|
multiple arrays, indexed by any kind of pandas supported index, except |
|
time-dataframe |
|
multiple arrays, indexed by a |
|
vtime-dataframe |
|
multiple arrays, indexed by a |
|
raster |
|
Georeferenced raster image. Any file format supported by GDAL is supported by |
|
vraster |
? |
I have no idea |
Variable bound georeferenced raster image. No idea how to do that. |
html |
|
|
generic HTML output. Tools that output text or HTML tables can use this data-type |
plot |
|
|
Bokeh components |
Hierachical Order#
array
- iarray
- varray
- timeseries
- vtimeseries
ndarray
- raster
- vraster
- 2darray
- idataframe
- vdataframe
- time-dataframe
- vtime-dataframe
html
- plot
The hierachy is downward compatible. That means a named variable vtimeseries
is also a valid timeseries
and an array
, but not a varray. It is possbile that tools might accept data types from different branches. A tool that accepts array
and ndarray
will literally take any input.
Defining data types#
The definition of input and output data is done using a LiteralInput
and LiteralOutput
of type string
.
The actual data is stored in temporary files as specified in the table above.
All files of one output/input will use the same file name and only differ in file ending. The file name is a UUID version 4.
They will always consist of at least two files, one for the data and a metadata file of type .json
WPS Input#
If a tool needs any of the file-based data types from above, the WPS process just needs to define an input of type string and awaits the UUID of the given data source.
class MyTool(Process):
def __init__(self):
inputs = [
LiteralInput(
'timeseries',
'UUID of the timeseries to be used',
data_type='string',
min_occurence=2,
max_occurence=5
)
]
The definition of inputs follows LiteralInput(identifier, title=None, data_type=None, abstract='', uoms=None, min_occurs=1, max_occurs=1, allowed_values=None, default=None)
identifier - Name of input and definition of more specific datatypes, e.g.
input1__timeseries__discharge
.title - Title of input field exposed to the user.
data_type - General data type of input. Standard input is a textfield. Adaptations for special cases are in preparation. E.g.
string
exposes a textfield;string
in combination with a specific datatype defined in the identifier exposes a dropdown menu. Allowed are the basic data types'float', 'boolean', 'integer', 'string', 'positiveInteger', 'anyURI', 'time', 'date', 'dateTime', 'scale', 'angle', 'nonNegativeInteger'
.abstract - Information for the user about the data.
uoms - units (not used yet)
min_occurs - minimum occurrence of the input.
default = 1
max_occurs - maximum occurrence of the input.
default = 1
min_occurs |
max_occurs |
implication for wps input |
element |
---|---|---|---|
|
|
single value & required |
dropdown |
|
|
single value & not required |
dropdown |
|
|
one or more values & not required |
multi select dropdown |
|
|
one or more values & required |
multi select dropdown |
allowed_values - defined values exposed to the user as radio buttons.
default - value used when the user selects nothing for this input.
The definition of the file-based data types is supposed to be done in Keywords
, when they are implemented in pywps
and owslib
.
The example above results in a dropdown that shows 'UUID of the timeseries to be used'
when the user hovers over the element. The user can select two to five timeseries datasets from the dropdown (min_occurence=2
forces a *required
flag to the input field, and max_occurence=5
forces a multiple select option).
More information how to describe the inputs of a wps in general is given in the pywps documentation.
WPS Output#
The WPS Process output will always be a string named after the output type from the table above containing the UUID without file type.
A single WPS might have more than one output.
An additional output called 'error'
is appended to each WPS Process. It contains a JSON-serialized string of the form:
{
"error": True | False,
"message": "error message if any",
"type": "error type if any"
}
The error-object will also be returned, if there is no error. Then, the error['error']==False
. The error type can be one of:
bug - unexpected exceptions that are not handled in the Toolbox. These should be reported to the developer.
userWarning - mainly due to wrong options passed.
processError - expected errors that are specific to the tool. These errors need to be reported to the user.
An example of a processError would be a numpy.LinAlgError
that is raised during Kriging, if the kriging matrix is bad conditioned.
This is an expected error that should be reported to the user, as it could indicate that the Kriging results are incorrect.
These kind of errors cannot be handled in the Toolbox.
Metadata file#
Each tool will also write a metadata file in .json
format. This metadata file contains
metadata about the initial datasets that might be required by some tools, and a collection of all tools that were already applied.
That means, for a specific tool, the corresponding .json
will conain the UUIDs of other tool runs.
In this way, a toolchain can be traced. If you re-run a tool, a new .json
will be written.
The .json
file will contain all information about the processed tool. This includes metadata like the tool name and the
given inputs, but also parameters like the processing time. For any input which was the result of another tool, the UUID is
given anyway and a toolchain can be reconstructed.
If a tool loaded data into the workspace, either by loading it from the database or by producing mdelling output, the
necessary metadata will be contained in a special key 'meta'
or 'entry_id'
of the file.
This way, if a tool needs the geolocation of a dataset, but is the not the first tool in line, it can reconstruct the
toolchain until any of the other .json
contains a special key.
'meta'
is expected to have the same structure as the metadata in the V-FOR-WaTer database, while 'entry_id'
can
be used to query the needed information and is thus the preferred way.
A description of the .json
is shown below:
{
"identifier": "Identifier of the WPS process",
"title": "Title of the WPS process",
"version": "Version of the WPS process",
"inputs": [
{
"uuid": "UUID if the input was a tool output | optional",
"entry_id": "ID in the database if data was loaded | optional",
"meta" : {} # the metadata itself if the two others are not applicable
}
],
"args": {}, # any other LiteralInput, that is not in inputs
"startUTC": "UTC timestamp when process tool was started",
"endUTC": "UTC timestamp when process tool ended",
"error": {} # error object as described above
}
Todo
fill the exapmple above
Example#
If you run a tool that produces a timeseries
without error the WPS output will look similar to:
OUTPUTS['timeseries']: 'd5e98fdb-f212-42ce-8011-50e64a0a3c16'
OUTPUTS['error']: '{"error": False, "message": "", "type": ""}'
On the drive, you will find the following files:
d5e98fdb-f212-42ce-8011-50e64a0a3c16.csv
d5e98fdb-f212-42ce-8011-50e64a0a3c16.json
If you run a tool that produces a plot
and the result vtimeseries
that is visualized in the plot without error the WPS output will look similar to:
OUTPUTS['vtimeseries'] = 'ec97a39e-933c-411f-b4f5-7716871bb64f'
OUTPUTS['plot'] = '6b23c7b6-4457-4ff2-82a3-e3c5136abfe0'
OUTPUTS['error']: '{"error": False, "message": "", "type": ""}'
On the drive, you will find the following files:
ec97a39e-933c-411f-b4f5-7716871bb64f.csv
ec97a39e-933c-411f-b4f5-7716871bb64f.json
6b23c7b6-4457-4ff2-82a3-e3c5136abfe0.script.html
6b23c7b6-4457-4ff2-82a3-e3c5136abfe0.div.html
6b23c7b6-4457-4ff2-82a3-e3c5136abfe0.json
Note
At this point the two .json
files essentially contain the same information, but as the vtimeseries
might be further processed.