public class BagJoin extends AliasableEvalFunc<org.apache.pig.data.DataBag>
The format for invocation is BagJoin(bag, 'key',....). This UDF expects that all bags are non-null and that there is a corresponding key for each bag. The key that is expected is the alias of the key inside of the preceding bag. By default, an 'inner' join is performed. You can also perform 'left' or 'full' outer joins by specifying 'left' or 'full' in the definition.
Example:
define BagJoin datafu.pig.bags.BagJoin(); -- inner join
-- describe data:
-- data: {bag1: {(key1: chararray,value1: chararray)},bag2: {(key2: chararray,value2: int)}}
bag_joined = FOREACH data GENERATE BagJoin(bag1, 'key1', bag2, 'key2') as joined;
-- describe bag_joined:
-- bag_joined: {joined: {(bag1::key1: chararray, bag1::value1: chararray, bag2::key2: chararray, bag2::value2: int)}}
Modifier and Type | Class and Description |
---|---|
static class |
BagJoin.JoinType |
Constructor and Description |
---|
BagJoin() |
BagJoin(java.lang.String joinType) |
Modifier and Type | Method and Description |
---|---|
org.apache.pig.data.DataBag |
exec(org.apache.pig.data.Tuple input) |
org.apache.pig.impl.logicalLayer.schema.Schema |
getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
Specify the output schema as in {link EvalFunc#outputSchema(Schema)}.
|
getBag, getBoolean, getDouble, getDouble, getFieldAliases, getFloat, getFloat, getInteger, getInteger, getLong, getLong, getObject, getPosition, getPosition, getPrefixedAliasName, getString, getString, outputSchema
getContextProperties, getInstanceName, getInstanceProperties, onReady, setUDFContextSignature
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, warn
public org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input) throws java.io.IOException
exec
in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
java.io.IOException
public org.apache.pig.impl.logicalLayer.schema.Schema getOutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
AliasableEvalFunc
getOutputSchema
in class AliasableEvalFunc<org.apache.pig.data.DataBag>
input
- input schema