Osh's Python Application Programming Interface

All osh capabilities are available through both the command-line interface and the Python application programming interface. The CLI has some special syntactic forms; with the API, everything is specified using functions.

API Usage

In order to use osh from within python code, simply import the osh API, and invoke the osh interpreter using the function osh, e.g.
    #!/usr/bin/python

    from osh.api import *

    osh(gen(10), f(lambda x: x**2), out())
The last line of this script works as follows: This style of osh usage does not provide any convenient way of grabbing the objects generated by the last command, (or any other command for that matter), and making those objects available to code outside the dynamic scope of the osh invocation.

This could be addressed by defining a function, invoked from f, which, for each invocation, places its input in a list. The osh API provides a simpler alternative: If the last command in the command sequence is return_list, then the output from the previous command is accumulated and returned as the value of the osh invocation, e.g.

    #!/usr/bin/python
    
    from osh.api import *

    print osh(gen(10), f(lambda x: x**2), return_list())

To pass lines of text from stdin to osh, use the stdin command. For example, the following script can be used to print sorted input:

    #!/usr/bin/python

    from osh.api import *

    osh(stdin(), sort(), out())

Error and Exception Handling

The default exception and error handlers can be by invoking set_exception_handler and set_error_handler in your script or from .oshrc. (See the documentation on the module osh.error for details.)

Parallel and Remote Execution

Threads of execution are created by the fork command. Example:
    #!/usr/bin/python

    from osh.api import *

    osh(fork(3, sh('sleep 5; date')), out())
Output from this command looks like this:
    (0, 'Mon Jul 28 08:48:34 EDT 2008')
    (1, 'Mon Jul 28 08:48:34 EDT 2008')
    (2, 'Mon Jul 28 08:48:34 EDT 2008')
Each output tuple contains a label identifying the thread and output from the command run in the thread. The fact that all printed dates are the same shows that sleep 5 executed simultaneously on all threads.

The thread specification can also be a function returning a sequence, e.g.

    #!/usr/bin/python

    from osh.api import *

    osh(fork(['a', 'b', 'c'], sh('sleep 5; date')), out())
Output:
    ('a', 'Mon Jul 28 08:58:13 EDT 2008')
    ('b', 'Mon Jul 28 08:58:13 EDT 2008')
    ('c', 'Mon Jul 28 08:58:13 EDT 2008')
In this example, the first argument to fork is a sequence. Each item in this sequence gives rise to a thread; the threads execute and generate output, labelled with the thread's id. (fork also supports functions that evaluate to a sequence.)

fork can also be used for remote execution, by providing a cluster name (configured in .oshrc) as the first argument, e.g.

    #!/usr/bin/python

    from osh.api import *

    osh(fork('fred', sh('sleep 5; date')), out())
Output:
    ('101', 'Mon Jul 28 08:58:51 EDT 2008')
    ('102', 'Mon Jul 28 08:58:51 EDT 2008')
    ('103', 'Mon Jul 28 08:58:51 EDT 2008')

A subset of the nodes in a cluster can be specified as follows:

    #!/usr/bin/python

    from osh.api import *

    osh(fork('fred:102', sh('sleep 5; date')), out())
fred:102 specifies that the command should be run on nodes of fred whose name contains 102 as a substring. Since the names of the nodes in cluster fred are 101, 102, 103, only node 102 is selected.

fred:10 would select all nodes of the cluster since all node names contain 10. Remote execution can also be done using remote instead of fork. These commands are the same, except that remote accepts only a cluster name as its first argument.

If the command sequence to be executed remotely has more than one command, then the commands need to be organized into a list. For example, this script finds all java processes running on each node of cluster fred:

    #!/usr/bin/python

    from osh.api import *

    osh(remote('fred',
               [ps(),
                select('p: "java" in p.command_line'),
                f('p: (p.pid, p.command_line)')]),
        out())
The list of remotely executed commands works as follows:

Using Python Functions in the Osh API

In the osh CLI, functions are written as strings, e.g. 'x: x + 1'. This syntax is accepted by the osh API, but functions can also be written as ordinary lambda expressions. So f(lambda x: x + 1) and f('x: x + 1') work interchangeably.

However, there is one situation in which the string form of function is required. If a function is to be executed remotely, then the function must be written using a string. (The reason for this is that command sequences to be executed remotely are pickled and sent to each node over ssh. Strings can be pickled but functions cannot always be pickled.)

Merging partial results

If multiple threads of execution produce ordered results, then the results can be merged. For example, this script generates the numbers 0, 1 and 2 on each of three threads:
    #!/usr/bin/python
    
    from osh.api import *
    
    osh(fork(['a', 'b', 'c'], gen(3)), out())
Output:
    ('a', 0)
    ('a', 1)
    ('a', 2)
    ('b', 0)
    ('b', 1)
    ('b', 2)
    ('c', 0)
    ('c', 1)
    ('c', 2)
The sequence produced by each thread is ordered. To produce a merged sequence, we provide a merge function:
    #!/usr/bin/python
    
    from osh.api import *
    
    osh(fork(['a', 'b', 'c'], gen(3), lambda x: x), out())
Output:
    ('a', 0)
    ('b', 0)
    ('c', 0)
    ('a', 1)
    ('b', 1)
    ('c', 1)
    ('a', 2)
    ('b', 2)
    ('c', 2)
The last argument to fork is a merge function, indicating that the sequences produced by each thread are to be merged. Each node's input to the merge is a sequence of integers, (output from gen). The merge function, x: x assumes that the inputs are ordered, (raising an exception if this assumption does not hold); and interleaves tuples from the threads so that the output sequence is ordered by the values from gen.