Package datafu.pig.bags

A collection of general purpose UDFs for operating on bags.

See:
          Description

Class Summary
AppendToBag Appends a tuple to a bag.
BagConcat Unions all input bags to produce a single bag containing all tuples.
BagGroup Performs an in-memory group operation on a bag.
BagLeftOuterJoin Performs an in-memory left outer join across multiple bags.
BagSplit Splits a bag of tuples into a bag of bags, where the inner bags collectively contain the tuples from the original bag.
CountEach Generates a count of the number of times each distinct tuple appears in a bag.
DistinctBy Get distinct elements in a bag by a given set of field positions.
EmptyBagToNull Returns null if the input is an empty bag; otherwise, returns the input bag unchanged.
EmptyBagToNullFields For an empty bag, inserts a tuple having null values for all fields; otherwise, the input bag is returned unchanged.
Enumerate Enumerate a bag, appending to each tuple its index within the bag.
FirstTupleFromBag Returns the first tuple from a bag.
NullToEmptyBag Returns an empty bag if the input is null; otherwise, returns the input bag unchanged.
PrependToBag Prepends a tuple to a bag.
ReverseEnumerate Enumerate a bag, appending to each tuple its index within the bag, with indices being produced in descending order.
UnorderedPairs Generates pairs of all items in a bag.
 

Package datafu.pig.bags Description

A collection of general purpose UDFs for operating on bags.



Matthew Hayes, Sam Shah