Osh's Command-Line Interface

The osh executable interprets its command-line arguments as osh syntax. Osh should be usable from any shell, although some osh CLI syntax may require escapes in some shells. (The osh CLI has been tested most extensively using the bash shell on Linux and OS X.)

CLI Usage

This osh command prints the integers 0 through 9 and prints the square of each:

    zack$ osh gen 10 ^ f 'x: x**2' ^ out

Comments:

The first token on the command-line, osh, invokes the osh executable. There are seven arguments to the osh command (gen, 10, ^, f, 'x: x**2', ^, out). These arguments are tokens in a language parsed by osh, and so the spaces present are significant. If any of the spaces between any adjacent tokens were eliminated, the result would be a syntactically incorrect osh command.
The space within the 5th token, 'x: x**2' is less important. This is Python code (a lambda expression, although the lambda keyword is optional), so Python rules apply.
gen 10 is an osh command generating a stream of Python objects: 0, 1, ..., 9.
^ is the osh symbol that connects output from one command to input of the next, similar to a Unix pipe. The first occurrence of ^ passes the integers generated by gen 10 to the next command, which squares them.
f 'x: x**2' takes a stream of integers as input, squares each, and writes the result to the output stream. I.e., f is an osh command that applies a function to its input, similar to the Python function map.
^ out pipes the output from the f command (the squared integers) to the out command, which prints its input to stdout.

As a syntactic convenience, the token $ can be used instead of ^ out at the end of an osh command sequence. However, if this is done, then only default behavior for out is obtained. (The out command has options for formatting, and for writing to files.)

Input can be passed to an osh command sequence using a Unix pipe as follows:

    zack$ cat /usr/share/dict/words | osh ^ select 'w: len(w) >= 20' $

This command begins by writing the contents of /usr/share/dict/words to stdout. A Unix pipe is used to send this stream of data to osh. "osh ^" is a special syntactic form which converts each line of input from stdin into a Python string (omitting the line terminator \n). These strings are passed to the next command which keeps only those strings whose length is at least 20, and prints them to stdout.

Error and Exception Handling

The default exception and error handlers can be overridden by invoking set_exception_handler and set_error_handlerf from .oshrc. (See the documentation on the module osh.error for details.)

Running Arbitrary Python Code

There are two mechanisms for running Python code from the osh command-line interface. One technique is to define variables, functions, etc. in .oshrc and then refer to those symbols in osh commands, (see Configuring osh for details).

The other approach is to use the osh commands imp and py. imp imports modules for use by commands later in the command sequence. py runs arbitrary Python code; the intent is to define symbols that can be used by commands later in the command sequence. Both commands pass objects received on the input stream to the output stream. In both cases, the import or python code is executed once, before objects start flowing on the streams connecting commands in the command sequence. For example, the following command line prints the area of circles with radii 0, 1, 2, ... 9:

    zack$ osh py 'pi = 3.14159265358979' ^ gen 10 ^ f 'r: (r, pi * r**2)' $
    (0, 0.0)
    (1, 3.14159265359)
    (2, 12.5663706144)
    (3, 28.2743338823)
    (4, 50.2654824574)
    (5, 78.5398163397)
    (6, 113.097335529)
    (7, 153.938040026)
    (8, 201.06192983)
    (9, 254.469004941)

Parallel and Remote Execution

A special syntactic form is used for creating threads of execution. Example:

    zack$ osh @3 [ sh 'sleep 5; date' ] $
    (0, 'Mon Aug  6 23:03:26 EDT 2007')
    (1, 'Mon Aug  6 23:03:26 EDT 2007')
    (2, 'Mon Aug  6 23:03:26 EDT 2007')

@3 introduces three threads of execution. Each thread has state identifying the thread, in this case the integers 0, 1, and 2. The bracketed command sequence, sh 'sleep 5; date', is executed on each thread. sh is an escape to a native shell, so sh 'sleep 5; date' is executed on each thread. Each line of output contains the thread state, identifying the thread, and output from the executed command.

Thread state can also be generated by evaluating a function returning a sequence, e.g.

    zack$ osh @'range(3)' [ sh 'sleep 5; date' ] $
    (0, 'Mon Aug  6 23:03:26 EDT 2007')
    (1, 'Mon Aug  6 23:03:26 EDT 2007')
    (2, 'Mon Aug  6 23:03:26 EDT 2007')

The fact that all printed dates are the same shows that sleep 5 executed simultaneously on all threads. The function is specified by 'range(3)'. This is a function with no arguments generating the integers 0, 1, 2.

Finally, parallel execution can also be initiated by naming a cluster, e.g.

    zack$ osh @fred [ sh 'sleep 5; date' ] $
    ('101', 'Mon Aug  6 23:03:26 EDT 2007')
    ('102', 'Mon Aug  6 23:03:26 EDT 2007')
    ('103', 'Mon Aug  6 23:03:26 EDT 2007')

In this case, thread state contains an object describing a node in the named cluster, (configured in .oshrc), and each thread runs the bracketed command on the indicated node.

A subset of the nodes in a cluster can be specified as follows:

    zack$ osh @fred:102 [ sh 'sleep 5; date' ] $
    ('102', 'Mon Aug  6 23:13:19 EDT 2007')

@fred:102 specifies that the command should be run on nodes of fred whose name contains 102 as a substring. Since the names of the nodes in cluster fred are 101, 102, 103, only node 102 is selected.

@fred:10 would select all nodes of the cluster since all node names contain 10.

Merging partial results

If multiple threads of execution produce ordered results, then the results can be merged. For example, this command generates the numbers 0, 1 and 2 on each of three threads:

    zack$ osh @"('a', 'b', 'c')" [ gen 3 ] $
    ('a', 0)
    ('a', 1)
    ('a', 2)
    ('b', 0)
    ('b', 1)
    ('b', 2)
    ('c', 0)
    ('c', 1)
    ('c', 2)

The sequence produced by each thread is ordered. To produce a merged sequence, we provide a merge function:

    zack$ osh @"('a', 'b', 'c')" [ gen 3 // 'x: x' ] $
    ('a', 0)
    ('b', 0)
    ('c', 0)
    ('a', 1)
    ('b', 1)
    ('c', 1)
    ('a', 2)
    ('b', 2)
    ('c', 2)

// is the merge operator, indicating that the sequences produced by each thread are to be merged. Each node's input to the merge operator is a sequence of integers, (output from gen). The function following the merge operator, x: x assumes that the inputs are ordered, (raising an exception if this assumption does not hold); and interleaves tuples from the threads so that the output sequence is ordered by the values from gen.

If the merge function is x: x, then the function can be omitted, e.g.

    zack$ osh @"('a', 'b', 'c')" [ gen 3 // ] $
    ('a', 0)
    ('b', 0)
    ('c', 0)
    ('a', 1)
    ('b', 1)
    ('c', 1)
    ('a', 2)
    ('b', 2)
    ('c', 2)