#!/usr/bin/python
from osh.api import *
osh(gen(10), f(lambda x: x**2), out())
The last line of this script works as follows:
This could be addressed by defining a function, invoked from f, which, for each invocation, places its input in a list. The osh API provides a simpler alternative: If the last command in the command sequence is return_list, then the output from the previous command is accumulated and returned as the value of the osh invocation, e.g.
#!/usr/bin/python
from osh.api import *
print osh(gen(10), f(lambda x: x**2), return_list())
To pass lines of text from stdin to osh, use the stdin command. For example, the following script can be used to print sorted input:
#!/usr/bin/python
from osh.api import *
osh(stdin(), sort(), out())
The default exception and error handlers can be by invoking set_exception_handler and set_error_handler in your script or from .oshrc. (See the documentation on the module osh.error for details.)
#!/usr/bin/python
from osh.api import *
osh(fork(3, sh('sleep 5; date')), out())
Output from this command looks like this:
(0, 'Mon Jul 28 08:48:34 EDT 2008')
(1, 'Mon Jul 28 08:48:34 EDT 2008')
(2, 'Mon Jul 28 08:48:34 EDT 2008')
Each output tuple contains a label identifying the thread and output
from the command run in the thread. The fact that all printed dates
are the same shows that sleep 5 executed simultaneously on
all threads.
The thread specification can also be a function returning a sequence, e.g.
#!/usr/bin/python
from osh.api import *
osh(fork(['a', 'b', 'c'], sh('sleep 5; date')), out())
Output:
('a', 'Mon Jul 28 08:58:13 EDT 2008')
('b', 'Mon Jul 28 08:58:13 EDT 2008')
('c', 'Mon Jul 28 08:58:13 EDT 2008')
In this example, the first argument to fork is a sequence. Each
item in this sequence gives rise to a thread; the threads execute and
generate output, labelled with the thread's id. (fork also
supports functions that evaluate to a sequence.)
fork can also be used for remote execution, by providing a cluster name (configured in .oshrc) as the first argument, e.g.
#!/usr/bin/python
from osh.api import *
osh(fork('fred', sh('sleep 5; date')), out())
Output:
('101', 'Mon Jul 28 08:58:51 EDT 2008')
('102', 'Mon Jul 28 08:58:51 EDT 2008')
('103', 'Mon Jul 28 08:58:51 EDT 2008')
A subset of the nodes in a cluster can be specified as follows:
#!/usr/bin/python
from osh.api import *
osh(fork('fred:102', sh('sleep 5; date')), out())
fred:102 specifies that the command should be run on nodes of fred
whose name contains 102 as a substring. Since the names of the nodes in cluster
fred are 101, 102, 103, only node 102 is selected.
fred:10 would select all nodes of the cluster since all node names contain 10. Remote execution can also be done using remote instead of fork. These commands are the same, except that remote accepts only a cluster name as its first argument.
If the command sequence to be executed remotely has more than one command, then the commands need to be organized into a list. For example, this script finds all java processes running on each node of cluster fred:
#!/usr/bin/python
from osh.api import *
osh(remote('fred',
[ps(),
select('p: "java" in p.command_line'),
f('p: (p.pid, p.command_line)')]),
out())
The list of remotely executed commands works as follows:
However, there is one situation in which the string form of function is required. If a function is to be executed remotely, then the function must be written using a string. (The reason for this is that command sequences to be executed remotely are pickled and sent to each node over ssh. Strings can be pickled but functions cannot always be pickled.)
#!/usr/bin/python
from osh.api import *
osh(fork(['a', 'b', 'c'], gen(3)), out())
Output:
('a', 0)
('a', 1)
('a', 2)
('b', 0)
('b', 1)
('b', 2)
('c', 0)
('c', 1)
('c', 2)
The sequence produced by each thread is ordered. To produce a merged sequence, we provide a
merge function:
#!/usr/bin/python
from osh.api import *
osh(fork(['a', 'b', 'c'], gen(3), lambda x: x), out())
Output:
('a', 0)
('b', 0)
('c', 0)
('a', 1)
('b', 1)
('c', 1)
('a', 2)
('b', 2)
('c', 2)
The last argument to fork is a merge function, indicating
that the sequences produced by each thread are to be merged. Each
node's input to the merge is a sequence of integers, (output
from gen). The merge function, x:
x assumes that the inputs are ordered, (raising an exception if
this assumption does not hold); and interleaves tuples from the
threads so that the output sequence is ordered by the values
from gen.