Package osh :: Package command :: Module agg
[frames] | no frames]

Module agg

source code

agg [-r] [[-g|-c] GROUPING_FUNCTION] INITIAL_VALUE AGGREGATION_FUNCTION

Aggregates objects from the input stream. If GROUPING_FUNCTION is omitted, then one output object is generated by initializing an accumulator to INITIAL_VALUE and then combining the accumulator with input objects using AGGREGATION_FUNCTION. AGGREGATION_FUNCTION takes two inputs, the current value of the accumulator and an object from the input stream.

Example: If the input objects are integers 1, 2, 3, then the sum of the integers is computed as follows:

   ... ^ agg 0 'sum, x: sum + x'

which yields (6,).

If GROUPING_FUNCTION is specified, then a set of accumulators is maintained, one for each value of GROUPING_FUNCTION. Each output object is a tuple with two parts, the group value and the accumulated value for the group.

Example: If the input objects are ('a', 1), ('a', 2), ('b', 3), ('b', 4), then the sum of ints for each string is computed as follows:

   ... ^ agg -g 'x, y: x' 0 'sum, x, y: sum + y'

which yields ('a', 3), ('b', 7).

If the grouping function is specified with the -g flag, then agg generates its output when the input stream has ended. (It has to, because group members map appear in any order.) In some situations however, group members appear consecutively, and it is useful to get output earlier. If group members are known to be consecutive, then the group function can be specified using the -c flag.

If the -r flag is specified, then one output object is generated for each input object; the output object contains the value of the accumulator so far. The accumulator appears in the output row before the inputs. For example, if the input stream contains 1, 2, 3, then the running total can be computed as follows:

   ... ^ agg -r 0 'sum, x: sum + x' ^ ...

The output stream would be (1, 1), (3, 2), (6, 3). In the last output object, 6 is the sum of the current input (3) and all preceding inputs (1, 2).

The -r flag can also be used with grouping. For example, if the input objects are ('a', 1), ('a', 2), ('b', 3), ('b', 4), then the running totals for the strings would be computed as follows:

   ... ^ agg -r -g 'x, y: x' 0 'sum, x, y: sum + y' ^ ...

The output stream would be (1, 'a', 1), (3, 'a', 2), (3, 'b', 3), (7, 'b', 4). I.e., the running total is reinitialized to 0 for each group.

Functions
 
agg(initial_value, aggregator, group=None, consecutive=None, running=False)
Combine inputs into a smaller number of outputs.
source code
Function Details

agg(initial_value, aggregator, group=None, consecutive=None, running=False)

source code 

Combine inputs into a smaller number of outputs. If neither group nor consecutive is specified, then there is one accumulator, initialized to initial_value. The aggregator function is used to combine the current value of the accumulator with the input to yield the next value of the accumulator. The arguments to aggregator are the elements of the accumulator followed by the elements of one piece of input. If group is specified, then there is one accumulator for each group value, defined by applying the function group to each input. consecutive is just like group except that it is assumed that group values are adjacent in the input sequence. At most one of group and consecutive may be specified. If running is false, then output contains one object per group, containing the aggregate value. (If neither group nor consecutive are provided, then there is just one group, representing the aggregate for the entire input stream.) If running is true, then each the aggregate value for the group is written out with each input object -- i.e., the output contains "running totals". In this case, the aggregate values appear before the input values in the output object.