{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Add command" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Help\n", "The help text for the `add` subcommand can be shown by passing the `-h` flag." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: metacatalog add [-h] [--version] [--connection CONNECTION] [--verbose]\n", " [--quiet] [--dev] [--logfile LOGFILE] [--csv CSV]\n", " [--txt TXT] [--json JSON]\n", " entity\n", "\n", "positional arguments:\n", " entity Name of the record entity to be added.\n", "\n", "optional arguments:\n", " -h, --help show this help message and exit\n", " --version, -v Returns the module version\n", " --connection CONNECTION, -C CONNECTION\n", " Connection string to the database instance.Follows the\n", " syntax: driver://user:password@host:port/database\n", " --verbose, -V Activate extended output.\n", " --quiet, -q Suppress any kind of output.\n", " --dev Development mode. Unexpected errors will not be\n", " handled and the full traceback is printed to the\n", " screen.\n", " --logfile LOGFILE If a file is given, output will be written to that\n", " file instead of printed to StdOut.\n", " --csv CSV Data Origin Flag. Pass a CSV filename or content\n", " containing the data. Column header have to match the\n", " ADD API keywords.\n", " --txt TXT Data Origin Flag. Pass a text filename or content\n", " containing whitespace separated key=value pairs where\n", " key has to match the ADD API keywords. If used\n", " directly remember to quote accordingly.\n", " --json JSON Data Origin Flag. Pass a JSON filename or content\n", " containing the data. Must contain a list of objects\n", " matchin the ADD API keywords.\n" ] } ], "source": [ "%%bash\n", "metacatalog add -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "The `add` command assumes that either [`create`](cli_create.ipynb) and [`populate`](cli_populate.ipynb) or [`init`](cli_init.ipynb) were executed successfully." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Usage\n", "\n", "### entity\n", "\n", "The `add` command has one positional argument `entity` that has to be provided. This is the name of the record entitiy that should be added. There is a dictionary in `metacatalog` that maps enitity names to database models:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'author': ,\n", " 'contributor': ,\n", " 'datasource': ,\n", " 'datasource_type': ,\n", " 'datasourcetype': ,\n", " 'entry': ,\n", " 'keyword': ,\n", " 'license': ,\n", " 'person': ,\n", " 'person_role': ,\n", " 'personrole': ,\n", " 'unit': ,\n", " 'variable': }\n" ] } ], "source": [ "from metacatalog.api._mapping import ENTITY_MAPPING\n", "from pprint import pprint\n", "pprint(ENTITY_MAPPING)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Many entities map to the same model. This is either due to different spelling, or because the API creates database records in different contexts. E.g. the API forces the user to pass at least one *person* as the first author of an *Entry* on creation. The *contributors* are optional and can be added if applicable. All *person*s will, however, be saved into the same table." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. warning::\n", "\n", " The CLI is operating at a much lower level than the Python API. Many semantical workflows which add data to the database include way more individual steps using the CLI." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### connection\n", "\n", "In case no default connection was created and saved, you have to supply a connection string to the database using the `--connection` flag. See [`connection`](cli_connection.ipynb) command." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### passing arguments\n", "\n", "Obviously, you need to pass the actual metadata, that should be stored in metacatalog. There are three data origin flags available: \n", "\n", "* `--csv` - comma separated\n", "* `--txt` - key=value pairs\n", "* --`json` - JSON\n", "\n", "All three flags accept either a filename (including path) to a file in the specified format, or the content itself.\n", "Instead of creating a file and passing the filename:\n", "\n", "```csv\n", "name,symbol\n", "foo,F\n", "Bar,B\n", "```\n", "\n", "you can can also use the flag like: `--csv 'name,symbol\\nnfoo,F\\nbar,B'`. This might be the easier approach if only one or two records are added." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. note::\n", " \n", " You can inspect entities using the `show` command." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Operations" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. note::\n", "\n", " This section assumes that you are familiar with the metacatalog data model. As the CLI just uses the Python API under the hood, you will have to refer to the API documentation for a full reference." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most metadata creation task cannot be done with one call to add. Furthermore, some entities relate to records, that have to be added in the first place, to not violate relation constraints. A prior example is that a person has to exist in the database, before it can be placed as an author.\n", "\n", "A typical workflow is to add missing lookup data, which includes `variables,units,licenses,keywords` and `details`. Then, you create all `person`s involved. Finally, the metadata `Entry` can be added. For most lookup data, a `1:n` relation is modelled and you can pass anything accepted by the `find` api or the ID. \n", "`keyword`s and `person`s are, however, modelled in a `m:n` relation, which has to be specified in a second step." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. warning:: \n", "\n", " The ``details`` entity is not reflected in the CLI or API yet. \n", " \n", " The usage of passing other identifiers to `find` than the ID, is experimental and not fully functions. It might also change in a future release\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example\n", "\n", "The following example should illustrate a workflow for adding new meta-data.\n", "At first we add a unit of `awesomeness` and a variable of `awesome` - because most of our data is awesome." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)\n", "Added 1 unit records.\n", "Done.\n" ] } ], "source": [ "%%bash\n", "metacatalog add unit --csv 'name,symbol,si\\nawesomeness,a,m'" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)\n", "Added 1 variable records.\n", "Done.\n" ] } ], "source": [ "%%bash\n", "#metacatalog show attributes variables\n", "metacatalog add variable --csv 'name,symbol,unit\\nawesome,A,awesomeness'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we passed the new newly created unit name to the `add variable` endpoint." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)\n", "Added 1 person records.\n", "Done.\n" ] } ], "source": [ "%%bash\n", "metacatalog add person --json '[{\"first_name\": \"Alfred, E.\", \"last_name\": \"Neumann\", \"affiliation\": \"Institute of Awesomeness\"}]'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can create the Entries of Alfred, E.'s data from a json file:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "meta = dict(\n", " title=\"Alfred data\", \n", " abstract=\"A dummy test entry, which Alfred created\",\n", " location=(37.422051, -122.084615),\n", " license=2,\n", " embargo=True,\n", " variable=\"awesome\",\n", " author=\"Neumann\"\n", ")\n", "with open('alfred.json', 'w') as js:\n", " json.dump([meta], js)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)\n", "Added 1 entry records.\n", "Done.\n" ] } ], "source": [ "%%bash\n", "metacatalog add entry --json alfred.json\n", "rm alfred.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally, using the [`find`](cli_find.ipynb) and [`show`](cli_show.ipynb) command we can inspect the newly created entry:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)\n", "\n", "Using session: Engine(postgresql://postgres:***@localhost:5432/metacatalog)\n", " id title abstract external_id location geom creation end version latest_version_id comment license_id variable_id datasource_id embargo embargo_end publication lastUpdate\n", "---- -------------- --------------- ------------- --------------- ------ ---------- ----- --------- ------------------- --------- ------------ ------------- --------------- --------- -------------------------- -------------------------- --------------------------\n", " 20 Alfred data... A dummy test... 01010000003F... 1 2 15 True 2022-05-22 05:45:24.827462 2020-05-22 05:45:24.827531 2020-05-22 05:45:24.827539\n" ] } ], "source": [ "%%bash\n", "metacatalog find entry --by title \"Alfred data\"\n", "metacatalog show records entries --where \"id=20\" -T" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ".. note::\n", " \n", " In a future release, a set of flags will be added to the ``find`` command. These will make the export of found records into a file or as output to StdOut possbile. The ``show`` command is intended for raw table inspections only and just a workaround here. " ] } ], "metadata": { "celltoolbar": "Raw Cell Format", "finalized": { "timestamp": 1590129848476, "trusted": false }, "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13 (main, Aug 25 2022, 23:26:10) \n[GCC 11.2.0]" }, "vscode": { "interpreter": { "hash": "f54d8176e82297fa872ac8c77277e50c0e193f921954c1c4a0b1ae2e8be99b71" } } }, "nbformat": 4, "nbformat_minor": 4 }