public class SetDifference extends SetOperationsBase
If bags A and B are provided, then this computes A-B, i.e. all elements in A that are not in B. If bags A, B and C are provided, then this computes A-B-C, i.e. all elements in A that are not in B or C.
Example:
define SetDifference datafu.pig.sets.SetDifference();
-- input:
-- ({(1),(2),(3),(4),(5),(6)},{(3),(4)})
input = LOAD 'input' AS (B1:bag{T:tuple(val:int)},B2:bag{T:tuple(val:int)});
input = FOREACH input {
B1 = ORDER B1 BY val ASC;
B2 = ORDER B2 BY val ASC;
-- output:
-- ({(1),(2),(5),(6)})
GENERATE SetDifference(B1,B2);
}
Constructor and Description |
---|
SetDifference() |
Modifier and Type | Method and Description |
---|---|
int |
countMatches(java.util.PriorityQueue<datafu.pig.sets.SetDifference.Pair> pq)
Counts how many elements in the priority queue match the
element at the front of the queue, which should be from the first bag.
|
org.apache.pig.data.DataBag |
exec(org.apache.pig.data.Tuple input) |
outputSchema
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
public int countMatches(java.util.PriorityQueue<datafu.pig.sets.SetDifference.Pair> pq)
pq
- priority queuepublic org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input) throws java.io.IOException
exec
in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
java.io.IOException