Add command#
Help#
The help text for the add
subcommand can be shown by passing the -h
flag.
[1]:
%%bash
metacatalog add -h
usage: metacatalog add [-h] [--version] [--connection CONNECTION] [--verbose]
[--quiet] [--dev] [--logfile LOGFILE] [--csv CSV]
[--txt TXT] [--json JSON]
entity
positional arguments:
entity Name of the record entity to be added.
optional arguments:
-h, --help show this help message and exit
--version, -v Returns the module version
--connection CONNECTION, -C CONNECTION
Connection string to the database instance.Follows the
syntax: driver://user:password@host:port/database
--verbose, -V Activate extended output.
--quiet, -q Suppress any kind of output.
--dev Development mode. Unexpected errors will not be
handled and the full traceback is printed to the
screen.
--logfile LOGFILE If a file is given, output will be written to that
file instead of printed to StdOut.
--csv CSV Data Origin Flag. Pass a CSV filename or content
containing the data. Column header have to match the
ADD API keywords.
--txt TXT Data Origin Flag. Pass a text filename or content
containing whitespace separated key=value pairs where
key has to match the ADD API keywords. If used
directly remember to quote accordingly.
--json JSON Data Origin Flag. Pass a JSON filename or content
containing the data. Must contain a list of objects
matchin the ADD API keywords.
Prerequisites#
The add
command assumes that either `create
<cli_create.ipynb>`__ and `populate
<cli_populate.ipynb>`__ or `init
<cli_init.ipynb>`__ were executed successfully.
Usage#
entity#
The add
command has one positional argument entity
that has to be provided. This is the name of the record entitiy that should be added. There is a dictionary in metacatalog
that maps enitity names to database models:
[2]:
from metacatalog.api._mapping import ENTITY_MAPPING
from pprint import pprint
pprint(ENTITY_MAPPING)
{'author': <class 'metacatalog.models.person.Person'>,
'contributor': <class 'metacatalog.models.person.Person'>,
'datasource': <class 'metacatalog.models.datasource.DataSource'>,
'datasource_type': <class 'metacatalog.models.datasource.DataSourceType'>,
'datasourcetype': <class 'metacatalog.models.datasource.DataSourceType'>,
'entry': <class 'metacatalog.models.entry.Entry'>,
'keyword': <class 'metacatalog.models.keyword.Keyword'>,
'license': <class 'metacatalog.models.license.License'>,
'person': <class 'metacatalog.models.person.Person'>,
'person_role': <class 'metacatalog.models.person.PersonRole'>,
'personrole': <class 'metacatalog.models.person.PersonRole'>,
'unit': <class 'metacatalog.models.variable.Unit'>,
'variable': <class 'metacatalog.models.variable.Variable'>}
Many entities map to the same model. This is either due to different spelling, or because the API creates database records in different contexts. E.g. the API forces the user to pass at least one person as the first author of an Entry on creation. The contributors are optional and can be added if applicable. All persons will, however, be saved into the same table.
Warning
The CLI is operating at a much lower level than the Python API. Many semantical workflows which add data to the database include way more individual steps using the CLI.
connection#
In case no default connection was created and saved, you have to supply a connection string to the database using the --connection
flag. See `connection
<cli_connection.ipynb>`__ command.
passing arguments#
Obviously, you need to pass the actual metadata, that should be stored in metacatalog. There are three data origin flags available:
--csv
- comma separated--txt
- key=value pairs–
json
- JSON
All three flags accept either a filename (including path) to a file in the specified format, or the content itself. Instead of creating a file and passing the filename:
name,symbol
foo,F
Bar,B
you can can also use the flag like: --csv 'name,symbol\nnfoo,F\nbar,B'
. This might be the easier approach if only one or two records are added.
Note
You can inspect entities using the show command.
Operations#
Note
This section assumes that you are familiar with the metacatalog data model. As the CLI just uses the Python API under the hood, you will have to refer to the API documentation for a full reference.
Most metadata creation task cannot be done with one call to add. Furthermore, some entities relate to records, that have to be added in the first place, to not violate relation constraints. A prior example is that a person has to exist in the database, before it can be placed as an author.
A typical workflow is to add missing lookup data, which includes variables,units,licenses,keywords
and details
. Then, you create all person
s involved. Finally, the metadata Entry
can be added. For most lookup data, a 1:n
relation is modelled and you can pass anything accepted by the find
api or the ID. keyword
s and person
s are, however, modelled in a m:n
relation, which has to be specified in a second step.
Warning
The details
entity is not reflected in the CLI or API yet.
The usage of passing other identifiers to find than the ID, is experimental and not fully functions. It might also change in a future release
Example#
The following example should illustrate a workflow for adding new meta-data. At first we add a unit of awesomeness
and a variable of awesome
- because most of our data is awesome.
[8]:
%%bash
metacatalog add unit --csv 'name,symbol,si\nawesomeness,a,m'
Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)
Added 1 unit records.
Done.
[13]:
%%bash
#metacatalog show attributes variables
metacatalog add variable --csv 'name,symbol,unit\nawesome,A,awesomeness'
Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)
Added 1 variable records.
Done.
Here, we passed the new newly created unit name to the add variable
endpoint.
[19]:
%%bash
metacatalog add person --json '[{"first_name": "Alfred, E.", "last_name": "Neumann", "affiliation": "Institute of Awesomeness"}]'
Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)
Added 1 person records.
Done.
Finally, we can create the Entries of Alfred, E.’s data from a json file:
[14]:
import json
meta = dict(
title="Alfred data",
abstract="A dummy test entry, which Alfred created",
location=(37.422051, -122.084615),
license=2,
embargo=True,
variable="awesome",
author="Neumann"
)
with open('alfred.json', 'w') as js:
json.dump([meta], js)
[15]:
%%bash
metacatalog add entry --json alfred.json
rm alfred.json
Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)
Added 1 entry records.
Done.
And finally, using the `find
<cli_find.ipynb>`__ and `show
<cli_show.ipynb>`__ command we can inspect the newly created entry:
[21]:
%%bash
metacatalog find entry --by title "Alfred data"
metacatalog show records entries --where "id=20" -T
Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)
<ID=20 Alfred data [awesome] >
Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)
id title abstract external_id location geom creation end version latest_version_id comment license_id variable_id datasource_id embargo embargo_end publication lastUpdate
---- -------------- --------------- ------------- --------------- ------ ---------- ----- --------- ------------------- --------- ------------ ------------- --------------- --------- -------------------------- -------------------------- --------------------------
20 Alfred data... A dummy test... 01010000003F... 1 2 15 True 2022-05-22 05:45:24.827462 2020-05-22 05:45:24.827531 2020-05-22 05:45:24.827539
Note
In a future release, a set of flags will be added to the find
command. These will make the export of found records into a file or as output to StdOut possbile. The show
command is intended for raw table inspections only and just a workaround here.