public class BagGroup extends AliasableEvalFunc<org.apache.pig.data.DataBag>
The following example groups input_bag by k. The output is a bag having tuples consisting of the group key k and a bag with the corresponding (k,v) tuples from input_bag for that key.
define BagGroup datafu.pig.bags.BagGroup();
data = LOAD 'input' AS (input_bag: bag {T: tuple(k: int, v: chararray)});
-- ({(1,A),(1,B),(2,A),(2,B),(2,C),(3,A)})
-- Group input_bag by k
data2 = FOREACH data GENERATE BagGroup(input_bag, input_bag.(k)) as grouped;
-- data2: {grouped: {(group: int,input_bag: {T: (k: int,v: chararray)})}}
-- ({(1,{(1,A),(1,B)}),(2,{(2,A),(2,B),(2,C)}),(3,{(3,A)})})
If the key k is not needed within the input_bag for the output, it can be projected
out like so:
data3 = FOREACH data2 {
-- project only the value
projected = FOREACH grouped GENERATE group, input_bag.(v);
GENERATE projected as grouped;
}
-- data3: {grouped: {(group: int,input_bag: {T: (k: int,v: chararray)})}}
-- ({(1,{(A),(B)}),(2,{(A),(B),(C)}),(3,{(A)})})
Constructor and Description |
---|
BagGroup() |
Modifier and Type | Method and Description |
---|---|
org.apache.pig.data.DataBag |
exec(org.apache.pig.data.Tuple input) |
org.apache.pig.impl.logicalLayer.schema.Schema |
getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
Specify the output schema as in {link EvalFunc#outputSchema(Schema)}.
|
getBag, getBoolean, getDouble, getDouble, getFieldAliases, getFloat, getFloat, getInteger, getInteger, getLong, getLong, getObject, getPosition, getPosition, getPrefixedAliasName, getString, getString, outputSchema
getContextProperties, getInstanceName, getInstanceProperties, onReady, setUDFContextSignature
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, warn
public org.apache.pig.impl.logicalLayer.schema.Schema getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
AliasableEvalFunc
getOutputSchema
in class AliasableEvalFunc<org.apache.pig.data.DataBag>
input
- input schemapublic org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input) throws java.io.IOException
exec
in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
java.io.IOException