osh: Object-Oriented Shell

Osh

Osh (Object SHell) is a tool that integrates the processing of structured data, database access, and remote access to a cluster of nodes. These capabilities are made available through a command-line interface (CLI) and a Python application programming interface (API).

Osh processes streams of Python objects using simple commands. Complex data processing is achieved by command sequences in which the output from one command is passed to the input of the next. This is similar to composing Unix commands using pipes. However, Unix commands pass strings from one command to the next, and the commands (grep, awk, sed, etc.) are heavily string-oriented. Osh commands send primitive Python types such as strings and numbers; composite types such as tuples, lists and maps; objects representing files, dates and times; or even user-defined objects.

Example (CLI)

Suppose you have a cluster named fred, consisting of nodes 101, 102, 103. Each node has a database tracking work requests with a table named request. You can find the total number of open requests in each database as follows (using the CLI):

    jao@zack$ osh @fred [ sql "select count(*) from request where state = 'open'" ] ^ out
    ('101', 1)
    ('102', 0)
    ('103', 5)

osh: Invokes the osh interpreter.
@fred [ ... ]: fred is the name of a cluster, (configured in the osh configuration file, .oshrc). A thread is created for each node of the cluster, and the bracketed command is executed on each thread, in parallel.
sql "select count(*) from request where state = 'open'": sql is an osh command that submits a query to a relational database. The query output is returned as a stream of tuples.
^ out: ^ is the osh operator for piping objects from one command to the next In this case, the input objects are tuples resulting from execution of a SQL query on each node of the cluster. The out command renders each object as a string and prints it to stdout.
Each output row identifies the node of origination (e.g. 101, 102), and includes a tuple from the database on that node. So ('103', 5) means that the database on node 103 has 5 open requests.

Example continued

Now suppose you want to find the total number of open requests across the cluster. You can pipe the (node, request count) tuples into an aggregation command:

    jao@zack$ osh @fred [ sql "select count(*) from request where state = 'open'" ] ^ agg 0 'total, node, count: total + count' $
    6

agg: agg is the aggregation command. Tuples from across the cluster are piped into the agg command, which will accumulate results from all inputs.
0: agg will maintain a total, which is initialized to 0.
'total, node, count: total + count': This specifies an aggregation function. total is the running total, which was initialized to 0. node and count come from the sql command executed on each node of the cluster. total + count accumulates the counts from each node.
$: An alternative to ^ out that can be used at the end of a command only.
6: The total of the counts from across the cluster.

Note that this example combines remote execution on cluster nodes, database access (on each cluster node), and data processing (the aggregation step) in a single framework.

Example (API)

The same computation can be done using the API as follows:

    #!/usr/bin/python
    
    from osh.api import *
    
    osh(fork("fred",
             remote(sql("select count(*) from request where state = 'open'"))),
        agg(0, lambda total, node, count: total + count))

from osh.api import *: Imports the osh API.
osh(...): Invokes the osh interpreter
fork("fred", remote(sql(...))): Runs the sql command on each node of cluster fred, in parallel.
agg(...): Aggregates query results from across the cluster.

More information:

License (GPL)
Release history
User's Guide
Command Reference Guide
Download
Software with similar goals
PyCon 2006 paper on osh
PyCon 2006 talk on osh

jao@geophile.com