Add command#

Help#

The help text for the add subcommand can be shown by passing the -h flag.

[1]:
%%bash
metacatalog add -h
usage: metacatalog add [-h] [--version] [--connection CONNECTION] [--verbose]
                       [--quiet] [--dev] [--logfile LOGFILE] [--csv CSV]
                       [--txt TXT] [--json JSON]
                       entity

positional arguments:
  entity                Name of the record entity to be added.

optional arguments:
  -h, --help            show this help message and exit
  --version, -v         Returns the module version
  --connection CONNECTION, -C CONNECTION
                        Connection string to the database instance.Follows the
                        syntax: driver://user:password@host:port/database
  --verbose, -V         Activate extended output.
  --quiet, -q           Suppress any kind of output.
  --dev                 Development mode. Unexpected errors will not be
                        handled and the full traceback is printed to the
                        screen.
  --logfile LOGFILE     If a file is given, output will be written to that
                        file instead of printed to StdOut.
  --csv CSV             Data Origin Flag. Pass a CSV filename or content
                        containing the data. Column header have to match the
                        ADD API keywords.
  --txt TXT             Data Origin Flag. Pass a text filename or content
                        containing whitespace separated key=value pairs where
                        key has to match the ADD API keywords. If used
                        directly remember to quote accordingly.
  --json JSON           Data Origin Flag. Pass a JSON filename or content
                        containing the data. Must contain a list of objects
                        matchin the ADD API keywords.

Prerequisites#

The add command assumes that either `create <cli_create.ipynb>`__ and `populate <cli_populate.ipynb>`__ or `init <cli_init.ipynb>`__ were executed successfully.

Usage#

entity#

The add command has one positional argument entity that has to be provided. This is the name of the record entitiy that should be added. There is a dictionary in metacatalog that maps enitity names to database models:

[2]:
from metacatalog.api._mapping import ENTITY_MAPPING
from pprint import pprint
pprint(ENTITY_MAPPING)
{'author': <class 'metacatalog.models.person.Person'>,
 'contributor': <class 'metacatalog.models.person.Person'>,
 'datasource': <class 'metacatalog.models.datasource.DataSource'>,
 'datasource_type': <class 'metacatalog.models.datasource.DataSourceType'>,
 'datasourcetype': <class 'metacatalog.models.datasource.DataSourceType'>,
 'entry': <class 'metacatalog.models.entry.Entry'>,
 'keyword': <class 'metacatalog.models.keyword.Keyword'>,
 'license': <class 'metacatalog.models.license.License'>,
 'person': <class 'metacatalog.models.person.Person'>,
 'person_role': <class 'metacatalog.models.person.PersonRole'>,
 'personrole': <class 'metacatalog.models.person.PersonRole'>,
 'unit': <class 'metacatalog.models.variable.Unit'>,
 'variable': <class 'metacatalog.models.variable.Variable'>}

Many entities map to the same model. This is either due to different spelling, or because the API creates database records in different contexts. E.g. the API forces the user to pass at least one person as the first author of an Entry on creation. The contributors are optional and can be added if applicable. All persons will, however, be saved into the same table.

Warning

The CLI is operating at a much lower level than the Python API. Many semantical workflows which add data to the database include way more individual steps using the CLI.

connection#

In case no default connection was created and saved, you have to supply a connection string to the database using the --connection flag. See `connection <cli_connection.ipynb>`__ command.

passing arguments#

Obviously, you need to pass the actual metadata, that should be stored in metacatalog. There are three data origin flags available:

  • --csv - comma separated

  • --txt - key=value pairs

  • json - JSON

All three flags accept either a filename (including path) to a file in the specified format, or the content itself. Instead of creating a file and passing the filename:

name,symbol
foo,F
Bar,B

you can can also use the flag like: --csv 'name,symbol\nnfoo,F\nbar,B'. This might be the easier approach if only one or two records are added.

Note

You can inspect entities using the show command.

Operations#

Note

This section assumes that you are familiar with the metacatalog data model. As the CLI just uses the Python API under the hood, you will have to refer to the API documentation for a full reference.

Most metadata creation task cannot be done with one call to add. Furthermore, some entities relate to records, that have to be added in the first place, to not violate relation constraints. A prior example is that a person has to exist in the database, before it can be placed as an author.

A typical workflow is to add missing lookup data, which includes variables,units,licenses,keywords and details. Then, you create all persons involved. Finally, the metadata Entry can be added. For most lookup data, a 1:n relation is modelled and you can pass anything accepted by the find api or the ID. keywords and persons are, however, modelled in a m:n relation, which has to be specified in a second step.

Warning

The details entity is not reflected in the CLI or API yet.

The usage of passing other identifiers to find than the ID, is experimental and not fully functions. It might also change in a future release

Example#

The following example should illustrate a workflow for adding new meta-data. At first we add a unit of awesomeness and a variable of awesome - because most of our data is awesome.

[8]:
%%bash
metacatalog add unit --csv 'name,symbol,si\nawesomeness,a,m'
Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)
Added 1 unit records.
Done.
[13]:
%%bash
#metacatalog show attributes variables
metacatalog add variable --csv 'name,symbol,unit\nawesome,A,awesomeness'
Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)
Added 1 variable records.
Done.

Here, we passed the new newly created unit name to the add variable endpoint.

[19]:
%%bash
metacatalog add person --json '[{"first_name": "Alfred, E.", "last_name": "Neumann", "affiliation": "Institute of Awesomeness"}]'
Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)
Added 1 person records.
Done.

Finally, we can create the Entries of Alfred, E.’s data from a json file:

[14]:
import json

meta = dict(
    title="Alfred data",
    abstract="A dummy test entry, which Alfred created",
    location=(37.422051, -122.084615),
    license=2,
    embargo=True,
    variable="awesome",
    author="Neumann"
)
with open('alfred.json', 'w') as js:
    json.dump([meta], js)
[15]:
%%bash
metacatalog add entry --json alfred.json
rm alfred.json
Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)
Added 1 entry records.
Done.

And finally, using the `find <cli_find.ipynb>`__ and `show <cli_show.ipynb>`__ command we can inspect the newly created entry:

[21]:
%%bash
metacatalog find entry --by title "Alfred data"
metacatalog show records entries --where "id=20" -T
Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)
<ID=20 Alfred data [awesome] >
Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)
  id  title           abstract         external_id    location         geom    creation    end      version  latest_version_id    comment      license_id    variable_id  datasource_id    embargo    embargo_end                 publication                 lastUpdate
----  --------------  ---------------  -------------  ---------------  ------  ----------  -----  ---------  -------------------  ---------  ------------  -------------  ---------------  ---------  --------------------------  --------------------------  --------------------------
  20  Alfred data...  A dummy test...                 01010000003F...                                     1                                             2             15                   True       2022-05-22 05:45:24.827462  2020-05-22 05:45:24.827531  2020-05-22 05:45:24.827539

Note

In a future release, a set of flags will be added to the find command. These will make the export of found records into a file or as output to StdOut possbile. The show command is intended for raw table inspections only and just a workaround here.