Configuring osh

If osh is used to access databases or clusters, then connection information is required. This information is kept in a configuration file. The configuration file can also be used to define variable and function definitions for use in osh commands.

Location of the osh configuration file

The osh interpreter searches for an osh configuration file, checking these locations, in this order:

./.oshrc
~/.oshrc
/etc/oshrc

If none of these files exist, then osh will not be able to access any database or cluster.

If the osh configuration file is going to be used for database access, then it may contain a plaintext database password. It is therefore good practice to restrict access this file to, e.g. set permissions to 600.

Structure of the configuration file

The .oshrc file contains Python code. The API for specifying a configuration is contained in the module osh.config, so .oshrc always begins with this line:

    from osh.config import *

Database and cluster configuration is done using field and slice notation. For example, to specify that the sql command connects to the family database as 'jao':

    osh.sql.family.database = 'family'
    osh.sql.family.user = 'jao'

osh is the base configuration object.
osh.sql indicates that database connection for the sql command is being configured.
osh.sql.family means that a database with logical name family is being configured.
osh.sql.family.database = 'family' means that the actual name of the database, (understood by the database system), is familydb.
osh.sql.family.user = 'jao' means that the connection to familydb is done as user jao.

Slice notation is also supported, e.g.

    osh.sql['family'].user = 'jao'

This is useful in case a database or cluster name is known dynamically, or is not a legal Python identifier.

Configuring database access

Database access is done using the sql command, e.g.

    osh sql family 'select count(*) from request' $

family is the name of a database profile, specified in .oshrc. The profile specifies a module implementing the connect method of a DBAPI 2.0 driver, and the arguments to the module's connect method, e.g.

    osh.sql.family.driver = 'pg8000.dbapi'
    osh.sql.family.host = 'localhost'
    osh.sql.family.database = 'familydb'
    osh.sql.family.user = 'jao'
    osh.sql.family.password = 'l3tme1n'

This specification results in a connection being created as pg8000.dbapi.connect(host = 'localhost', database = 'familydb', user = 'jao', password = 'l3tme1n').

The DBAPI driver is not included with osh. It needs to be in your python environment, located on PYTHONPATH, in site-packages, or in a location specified by sitecustomize.py.

osh also provides access to postgres databases using the pygresql driver as follows:

    osh.sql.family.dbtype = 'postgres'
    osh.sql.family.host = 'localhost'
    osh.sql.family.db = 'familydb'
    osh.sql.family.user = 'jao'
    osh.sql.family.password = 'l3tme1n'

A default database profile can be specified by assigning a value to osh.sql, e.g.

    osh.sql = 'family'

If this is done, then the sql command can omit the database profile name, e.g.

    osh sql 'select count(*) from request' $

Cluster configuration

This command executes a query on each node of cluster fred:

    osh @fred [ sql family 'select name, age from person' ] $

Each node of fred's cluster must configure a database profile named family in its .oshrc file.

On the local node (the one on which the above command is entered), the cluster fred must be configured, e.g.

    osh.remote.fred.hosts = ['192.168.100.101',
                             '192.168.100.102',
                             '192.168.100.103']

Access to the cluster is done as root. If you want to connect as some other user, specify the user as follows:

    osh.remote.fred.user = 'zack'

Remote access is done using ssh, and osh cannot handle any prompts, e.g. for password or a passphrase. If you can connect to a host silently using ssh, e.g. by setting up an ssh agent, then that will work for osh. If you use the ssh -i option, then the identity can be specified in .oshrc, e.g.

    osh.remote.fred.identity = '~/.ssh/fred.pem'

(You can check your ssh connection using the testssh command, e.g. osh @fred [ testssh ] $.)

When a remote command is run on a cluster, each row of output identifies the node that generated the output, e.g.

    zack$ osh @fred [ sql family 'select name, age from person' ] $
    ('192.168.100.101', 'hannah', 15)
    ('192.168.100.101', 'julia', 10)
    ('192.168.100.102', 'alexander', 16)
    ('192.168.100.102', 'nathan', 15)
    ('192.168.100.102', 'zoe', 11)
    ('192.168.100.103', 'danica', 1)

The hosts part of a cluster specification can also be done using a dict, in which the key is a name for the node, and the value is the IP address, e.g.

    osh.remote.fred.hosts = {'101': '192.168.100.101',
                             '102': '192.168.100.102',
                             '103': '192.168.100.103'}

The node names are used in command output in place of IP addresses, e.g.

    zack$ osh @fred [ sql familydb 'select name, age from person' ] $
    ('101', 'hannah', 15)
    ('101', 'julia', 10)
    ('102', 'alexander', 16)
    ('102', 'nathan', 15)
    ('102', 'zoe', 11)
    ('103', 'danica', 1)

These examples assume that each node of cluster fred has a .oshrc file that configures a database profile named family. It is also possible to configure database access even when each node specifies a different database profile, e.g.

    osh.remote.fred.hosts = {'101': {'host': '192.168.100.101', 'db_profile': 'db1'},
                             '102': {'host': '192.168.100.102', 'db_profile': 'db2'},
                             '103': {'host': '192.168.100.103', 'db_profile': 'db3'}}

With this configuration osh @fred [ sql 'select ...' ] ... accesses db1 on node 101, db2 on node 102, and db3 on node 103.

The complete rules for selecting a database profile are as follows:

Local execution (e.g. osh sql ...):
- If profile is specified with sql command, use it.
- If no profile is specified, use the default profile in .oshrc.
Remote execution (e.g. osh @cluster [ sql ... ]):
- If profile is specified with sql command, use it, looking up the profile name in .oshrc on each node of the cluster.
- If no profile is specified with the sql command, and the remote host has a db_profile specified (in the cluster's configuration in the local .oshrc), use that profile as configured in the .oshrc file on the remote host.
- If no profile is specified with the sql command, and the remote host has no db_profile specified, then use the default database profile from .oshrc on the remote host.

Arbitrary Python code in `.oshrc`

.oshrc contains Python code, and any symbols defined in it are available to osh commands. For example, this code prints the squares and cubes of the first ten integers:

    zack$ osh gen 10 ^ f 'x: (x**2, x**3)' $

The same computation can be done by writing a function to compute squares and cubes:

    def square_and_cube(x):
        return x**2, x**3

and putting this function definition in .oshrc. Then, the osh command above can be rewritten as:

    zack$ osh gen 10 ^ f 'x: square_and_cube(x)' $

Locating user-defined osh commands

It is easy to write your own osh commands (although the procedure is currently not documented). In order for such commands to be located by the osh interpreter, a search path needs to be specified in .oshrc. For example, suppose you have written a command xyz and placed its implementation, xyz.py, in /usr/local/foobar. Then you must add this to .oshrc:

    osh.path = ['/usr/local/foobar']

or the osh interpreter will not find xyz when used in an osh command sequence.

The value of osh.path is a list so that custom commands can be placed in any number of directories.