Find Command#

Help#

The help text for the find subcommand can be shown by passing the -h flag.

[1]:
%%bash
metacatalog find -h
usage: metacatalog find [-h] [--version] [--connection CONNECTION] [--verbose]
                        [--quiet] [--dev] [--logfile LOGFILE] [--by BY BY]
                        [--json] [--stdout] [--csv]
                        entity

positional arguments:
  entity                Name of the requested database entity.

optional arguments:
  -h, --help            show this help message and exit
  --version, -v         Returns the module version
  --connection CONNECTION, -C CONNECTION
                        Connection string to the database instance.Follows the
                        syntax: driver://user:password@host:port/database
  --verbose, -V         Activate extended output.
  --quiet, -q           Suppress any kind of output.
  --dev                 Development mode. Unexpected errors will not be
                        handled and the full traceback is printed to the
                        screen.
  --logfile LOGFILE     If a file is given, output will be written to that
                        file instead of printed to StdOut.
  --by BY BY            key value pair to be used for finding record(s) in the
                        database. Flag can be used multiple times.
  --json                Output the found entities as JSON objects
  --stdout              Default option. Print the string representation of
                        found entities to StdOut.
  --csv                 Output the found entities as CSV.

Prerequists#

The find command assumes that either `create <cli_create.ipynb>`__ and `populate <cli_populate.ipynb>`__ or `init <cli_init.ipynb>`__ were executed successfully.

Usage#

entity#

Note

The CLI endpoint of find is just wrapping the Python API endpoint. The API is designed for building model instances, which is often not really helpful from the command line. In future releases, more database model clases will represent themselves correctly when printed to StdOut. Furthermore a set of export flags are planned, to export models into CSV or JSON files. Until then, some entities might not turn out very helpful at the current state.

The find command has one positional argument entity that has to be provided. This is the name of the record entitiy that should be found. There is a dictionary in metacatalog that maps enitity names to database models:

[2]:
from metacatalog.api._mapping import TABLE_MAPPING
from pprint import pprint
pprint(TABLE_MAPPING)
{'datasource_types': <class 'metacatalog.models.datasource.DataSourceType'>,
 'datasources': <class 'metacatalog.models.datasource.DataSource'>,
 'entries': <class 'metacatalog.models.entry.Entry'>,
 'entry_groups': <class 'metacatalog.models.entrygroup.EntryGroup'>,
 'keywords': <class 'metacatalog.models.keyword.Keyword'>,
 'licenses': <class 'metacatalog.models.license.License'>,
 'person_roles': <class 'metacatalog.models.person.PersonRole'>,
 'persons': <class 'metacatalog.models.person.Person'>,
 'thesaurus': <class 'metacatalog.models.keyword.Thesaurus'>,
 'units': <class 'metacatalog.models.variable.Unit'>,
 'variables': <class 'metacatalog.models.variable.Variable'>}

Many entities map to the same model. This is either due to different spelling, or because the API creates database records in different contexts. E.g. the API forces the user to pass at least one person as the first author of an Entry on creation. The contributors are optional and can be added if applicable. All persons will, however, be saved into the same table.

connection#

In case no default connection was created and saved, you have to supply a connection string to the database using the --connection flag. See `connection <cli_connection.ipynb>`__ command.

passing arguments#

Arguments to filter for the correct records can be spcified by the --by flag. It’s usage is optional. If no filter is set, all records will be returned, which might be a lot. You can pass --by multiple times to create multiple filters.

Note

The find endpoint is not made for open searches and does not offer fine-granular filtering. Each filter passed is stacked on top of each other, effectively resulting in a logical AND connection.

The --by flag requires exactly two arguments. The first is the column to filter and the second the value which has to be matched. It cannot perform not-filters and does not accept a None or null.

Example#

[3]:
%%bash
metacatalog find licenses --by short_title ODbL
Open Data Commons Open Database License <ID=4>
[4]:
%%bash
metacatalog find licenses --by id 4
Open Data Commons Open Database License <ID=4>
[5]:
%%bash
metacatalog find licenses --by by_attribution True
Open Data Commons Open Database License <ID=4>
Open Data Commons Attribution License v1.0 <ID=5>
Creative Commons Attribution 4.0 International <ID=6>
Creative Commons Attribution-ShareAlike 4.0 International <ID=7>
Creative Commons Attribution-NonCommerical 4.0 International <ID=8>
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International <ID=9>
[6]:
%%bash
metacatalog find entry
<ID=3 Sap Flow - Hohes Hol [sap flow] >
<ID=4 Sap Flow - Hohes Hol [sap flow] >
<ID=5 Sap Flow - Hohes Hol [sap flow] >
<ID=6 Sap Flow - Hohes Hol [sap flow] >
<ID=7 Sap Flow - Hohes Hol [sap flow] >
<ID=8 Sap Flow - Hohes Hol [sap flow] >
<ID=9 Sap Flow - Hohes Hol [sap flow] >
<ID=16 Sap Flow - Hohes Hol [sap flow] >
<ID=17 Sap Flow - Hohes Hol [sap flow] >
<ID=18 Alfred's data [awesome] >
<ID=19 Alfred's data [awesome] >
<ID=1 Sap Flow - Hohes Hol [sap flow] >
<ID=11 Sap Flow - Hohes Hol [sap flow] >
<ID=10 Sap Flow - Hohes Hol [sap flow] >
<ID=12 Sap Flow - Hohes Hol [sap flow] >
<ID=13 Sap Flow - Hohes Hol [sap flow] >
<ID=14 Sap Flow - Hohes Hol [sap flow] >
<ID=15 Sap Flow - Hohes Hol [sap flow] >
<ID=2 Sap Flow - Hohes Hol [sap flow] >
<ID=20 Alfred data [awesome] >