Introduction

Osh (Object SHell) is a tool that integrates the processing of operating system objects, Python objects, database access, and remote access to a cluster of nodes. These capabilities are made available through a command-line interface (CLI) and a Python application programming interface (API).

Execution Model

Osh processes streams of Python objects using simple commands. Complex data processing is achieved by command sequences in which the output from one command is passed to the input of the next. This is similar to composing Unix commands using pipes. However, Unix commands pass strings from one command to the next, and the commands (grep, awk, sed, etc.) are heavily string-oriented. Osh commands process Python objects, and it is objects that are sent from one command to the next. Objects may be primitive types such as strings and numbers; composite types such as tuples, lists and maps; objects representing files, processes, dates and times; or user-defined objects.

The first command in an osh command sequence generates a stream of objects. Each subsequent command reads a stream of objects and writes a stream of objects. For a given command, the relationship between inputs and outputs is not necessarily one-to-one. For example, the f command reads one object from the stream, applies a function to it, and then generates one object to its output stream. But the select command copies objects from the input stream to the output stream if and only if the select's predicate evaluates to true for the input object. expand may generate any number of output objects for a single input object.

Commands, in addition to operating on input and output streams, may have side-effects. For example, out writes to stdout or a file; sql may update a database; and commands with function arguments (e.g. f, select) can operate on variables in the osh command sequence's namespace.

Examples

List the files in the current directory:
    jao@zack$ osh ls ^ out
    ('./history.html',)
    ('./index.html',)
    ('./license.txt',)
    ('./ref',)
    ('./similar.html',)
    ('./userguide',)
The reason that the output contains tuples such as ('./index.html',) instead of strings is that osh always pipes tuples between commands. The input to out is a stream of tuples, which are rendered using str. Standard python formatting can be used by supplying a formatting argument to out. So to render as an unquoted string, the command line would be
    jao@zack$ osh ls ^ out %s
osh.File objects have a number of attributes. These can be examined and printed, e.g.

List files and selected attributes:

    jao@zack$ osh ls ^ f 'lambda file: (oct(file.mode), file.size, file)' $
    ('0100644', 8137, './history.html')
    ('0100644', 5508, './index.html')
    ('0100644', 733, './license.txt')
    ('040755', 10234, './ref')
    ('0100644', 2108, './similar.html')
    ('040755', 408, './userguide')
In this command sequence, the output from the ls command, a stream of osh.File objects, is piped to the f command. f applies a function to each input tuple, and sends the resulting value to the output stream. In this example, the function has one argument named file. Output from this command contains the file's mode (in octal), size, and the file itself.

This example also shows that ^ out may be replaced by the symbol $. (If you need to specify a formatting argument or other argument to out, then this shorthand cannot be used.)

Filter out directories:

To limit the information printed to just files, omitting directories, the output from ls can be filtered:

    jao@zack$ osh ls ^ select 'file: file.isfile' ^ f 'file: (oct(file.mode), file.size, file)' $
    ('0100644', 8137, './history.html')
    ('0100644', 5508, './index.html')
    ('0100644', 733, './license.txt')
    ('0100644', 2108, './similar.html')
Now, the output from ls is piped to select. select has a function argument which selects osh.Files for which the isfile attribute is true. The selected osh.Files are then passed on and processed as before.

In this command sequence, the commands f and select both have function arguments, but the python keyword lambda keyword has been omitted, a shorthand permitted by osh.

Recursive listing:

A recursive listing can be obtained by using the -r flag to the ls command:

    jao@zack$ osh ls -r ^ f 'file: (oct(file.mode), file.size, file)' $
    ('040755', 10234, './ref')
    ('040755', 408, './userguide')
    ('0100644', 8137, './history.html')
    ('0100644', 5508, './index.html')
    ('0100644', 733, './license.txt')
    ('0100644', 2108, './similar.html')
    ('0100644', 50938, './ref/api-objects.txt')
    ('0100644', 15841, './ref/class-tree.html')
    ...
    ('0100644', 8302, './userguide/config.html')
    ('0100644', 503, './userguide/index.html')
    ('0100644', 4201, './userguide/installation.html')
    ('0100644', 8888, './userguide/intro.html')

Listing from all nodes of a cluster: Osh commands can be run remotely, on multiple hosts, in parallel. For example, suppose you have a cluster named fred, with hosts 101, 102, and 103, (the cluster name and node names would be specified in the osh configuration file, .oshrc). You could list the /var/log/messages* files from all these nodes as follows:

    jao@zack$ osh @fred [ ls '/var/log/messages*' ^ f 'file: (oct(file.mode), file.size, file.abspath)' ] $
    ('101', '0100600', 153311, '/var/log/messages.3')
    ('101', '0100600', 245349, '/var/log/messages.4')
    ('101', '0100600', 238494, '/var/log/messages')
    ('101', '0100600', 75552, '/var/log/messages.1')
    ('101', '0100600', 99174, '/var/log/messages.2')
    ('102', '0100600', 153311, '/var/log/messages.3')
    ('102', '0100600', 245349, '/var/log/messages.4')
    ('102', '0100600', 238494, '/var/log/messages')
    ('102', '0100600', 75552, '/var/log/messages.1')
    ('103', '0100600', 160875, '/var/log/messages.2')
    ('103', '0100600', 419873, '/var/log/messages.3')
    ('103', '0100600', 66614, '/var/log/messages.4')
    ('103', '0100600', 57772, '/var/log/messages')
    ('103', '0100600', 87651, '/var/log/messages.1')
    ('103', '0100600', 99989, '/var/log/messages.2')
@fred [ ... ] specifies that the bracketed command will be executed remotely, on each node of cluster fred, in parallel. Note that the last returned value is specified as file.abspath, the file's absolute path -- a string, and not file, the osh.File object itself. The reason for this is that osh.File objects cannot transmitted, (from a host in the fred cluster to the local host). Attributes of an osh.File object are computed on demand, and so cannot be obtained outside of the host containing the file. Process objects have similar restrictions.

Sum of file sizes across the cluster:

Suppose that we want to compute the sum of the sizes of all the /var/log/messages* files in all the nodes of the fred cluster. The aggregation command, agg can be used to do this:

    jao@zack$ osh @fred [ ls '/var/log/messages*' ^ f 'file: file.size' ] ^ agg 0 'sum, node, size: sum + size' $
    (2417360,)
The remote ls has been simplified since all we want to do is sum the sizes; we don't need the other attributes returned in the previous examples.

The first argument to agg is the initial value of the accumulator, 0. The second argument is an aggregation function. The first argument of this function is the current value of the accumulator, sum, and the remaining arguments are the values returned from the nodes of the cluster. The function returns sum + size as its value, which is then bound to sum for the next tuple of input from the cluster.

Osh Interfaces

Osh has two interfaces:

Command-line interface (CLI): The osh executable interprets command-line arguments as osh syntax. Any shell should be usable, however some osh CLI syntax may require escapes in some shells. (The osh CLI has been tested most extensively using the bash shell.)

Python application programming interface (API): The osh CLI invokes the osh runtime, which invokes Python modules corresponding to each command. The runtime and command modules can also be invoked from a Python API.