Skip to content

Input: Parameter and data definitions

Input of a tool consists of optional Parameters and optional Data.

We define a Parameter to be a structured argument, which is passed to a Tool on runtime. The sum of all passed Parameters make up the parameterization or parametrization of a tool execution.

All tools define their Parameters and Data in the tool.yml. This is the blueprint about the parameter values and input data that are acceptable or required along with specifications, e.g. about value ranges, default values and data types. The actual definition of input (parameterization and input data) when running a tool is file based and defaults to /in/input.json. The JSON format is mandatory.

Missing parameterization and input data

In case a Tool accepts only optional parameters and no input data, or no parameters and no data are defined at all, the /in/input.json can be an empty file:

{}

This is the only case, in which this file is optional and can be omitted entirely. Libraries parsing the parameters or tools, which read the file directly need to reflect this behavior and return an empty container. The exact data type of an empty parameterization container depends on the implementation language. In Python this would be an empty dict, in R an empty list.

Parameterization vs. Data

In the semantics of Tools, there is a difference between data, which is processed by a tool and parameters, which configure a tool. On the one hand this differentiation is important to reflect the meaning of arguments passed to generic tools, on the other hand there are implications for reproducible workflows.

Changing the parameters of the tool results in a different analysis workflow, as a change in parameters might in principle change the logic. Hence, a different parameterization describes a different analysis. Changing data does not change the tool logic. By definition, a tool is reproducible, if the parameterization of a tool can be applied to other data. That means, the same analysis is run on different data.

From a practical perspective, if you build a tool around these tool specifications, the tool name and content of the sections parameters and data of /in/input.json can be used to create checksums and therefore help to establish reproducible workflows.

Parameters: File specification

Each Parameter is described in a parameter block in the /src/tool.yml file. All parameters are collected as the mandatory tools.<tool_name>.parameters block:

tools:
  foobar:
    parameters:
      foo_parameter:
        [...]

Refer to the section below to learn about mandatory and optional fields for a Parameter.

Fields

The following section defines all mandatory and optional fields of a Parameter entity.

type

The type field is the only mandatory field. Each parameter needs a data-type. Allowed data-types include:

  • string
  • integer
  • float
  • boolean
  • enum
  • asset
enum

The type=enum field has an additional mandatory values field, which lists all allowed enum values. Note that enums should be validated by a parsing library or a library calling the tools. For the tools, enums parameters are treated like strings as soon as read from a input.json file.

Example

tool:
  foobar:
    parameters:
      my_enum:
        type: enum
        values:
          - option 1
          - option 2
          - option 3

asset

The type=asset can be used to specify paths to files or entire folders that are copied unchanged to the /in/ path of the tool container and thus made available to the tool for further processing. The parsing library never attempts to load and process these files, therefore assets are available as-is in the container. Assets are neither Data nor parameters, but their dynamic nature might influence the tool execution. Hence, they are added as input to the tool.

description

The description is a multiline comment to describe the purpose of the parameter. For the description, Markdown is allowed, although tool-frameworks are not required to parse it. Descriptions are optional and can be omitted.

A mutltiline comment in YAML can be specified like:

description: | 
    This is the first line
    This is the second line

array

The array field takes a single boolean value and defaults to array=false. If set to array=true the Parameter is an array of the specified type. The array field cannot be combined with the type=enum field.

min

Minimum value for constraining the value range. The min field is only valid for type=integer and type=float. Setting a minimum value is optional and can be omitted.
Note that if a max value is additionally specified for the parameter, min must be lower than max.

max

Maximum value for constraining the value range. The max field is only valid for type=integer and type=float. Setting a maximum value is optional and can be omitted.
Note that if a min value is additionally specified for the parameter, max must be higher than min.

optional

Boolean field which defaults to false. If set to optional=true, the parameter is not required by the tool. This implies, that the tool implementation can handle a input.json in which the Parameter is entirely missing.

default

The default field is of the same data type as the Parameter itself. If a default value is set, the tool-framework is required to inject this parameter into the input.json, as the tool will treat the default like any other non-optional parameter.
Note, that default parameters are only parsed if they are not set as optional=true.

Data: File specification

All input Data is described in a data block in the /src/tool.yml file. All sets of input data are collected as the optional tools.<tool_name>.data block. The simplest declaration of input data is to list all available data files in a single, top-level list:

tools:
  foobar:
    parameters:
      [...]
    data:
      - foo_data
      - foo_data2

If any of the dataset sources requires a more detailed configuration, objects can be specifies as well:

tools:
  foobar:
    parameters:
      [...]
    data:
      foo_data:
        description: Our first dataset with foo properties
      foo_data2:
        description: Our second dataset with foo2 properties

Refer to the section below to learn about the fields for Data.

Fields

The following section defines all fields of a Data entity.

description

The description is a single- or multiline comment to describe the input data. For the description Markdown is allowed, although tool-frameworks are not required to parse it. Descriptions are optional and can be omitted, but it is highly recommended to add descriptions to all required data inputs.

A multiline comment in YAML can be specified like:

description: | 
    This is the first line
    This is the second line

example

The example field is optional and can be used to reference a sample dataset for the given input, within the container. Data examples are a prime source for your users to understand how inputs should look like and be formatted.

example: /in/input_name.csv

It is considered good practice to add example data and example parameterizations to the /in/ folder. At inspection time, when a client application reads the tool.yml, this client can also access the examples in the /in/ folder. At runtime, as the client application mounts data and parameterizations into the container at /in/, the examples are non-existent in the container and cannot accidentally pollute the runtime container.

extension

The extension field is optional and can be used to limit the permitted file extensions for a data input. Allowed is a single string input or a list of strings. By convention, the point . should be included into the extension as well.

extension: .csv
extension:
  - .dat
  - .TXT

Note that the extension field is case insensitive.

Example

tools:
  foobar:
    title: Dummy Tools
    parameters:
      foo_int:
        type: integer
        min: 0
        max: 10
        description: An integer between 0 and 10
      foo_str:
        type: string
        default: My default string
      foo_option:
        type: enum
        values:
          - option 1
          - option 2
          - option 3
      foo_array:
        type: float
        array: true
        optional: true
        description: An optional array of floats
    data:
      foo_csv_data:
        description: |
          This is a CSV file that should contain valid input. We do currently
          not specify, what that exactly means.
      foo_nc_data:
        description: CF-netCDF 1.8 conform climate model output.