public class CountDistinctUpTo
extends org.apache.pig.AccumulatorEvalFunc<java.lang.Integer>
implements org.apache.pig.Algebraic
DEFINE CountDistinctUpTo10 datafu.pig.bags.CountDistinctUpTo('10');
DEFINE CountDistinctUpTo3 datafu.pig.bags.CountDistinctUpTo('3');
-- input:
-- {(A),(B),(D),(A),(C),(E),(A),(B),(A),(B)}
input = LOAD 'input' AS (B: bag {T: tuple(alpha:CHARARRAY)});
-- output:
-- (5)
output = FOREACH input GENERATE CountDistinctUpTo10(B);
-- output2:
-- (3)
output2 = FOREACH input GENERATE CountDistinctUpTo3(B);
| Modifier and Type | Class and Description |
|---|---|
static class |
CountDistinctUpTo.Final
Receives output either from initial results or intermediate
Outputs an integer with the number of distinct tuples, up to the maximum desired.
|
static class |
CountDistinctUpTo.Initial
Outputs a tuple containing a DataBag containing a single tuple T (the original schema) or an empty bag
T -> ({T})
|
static class |
CountDistinctUpTo.Intermediate
Receives a bag of bags, each containing a single tuple with the original input schema T
Outputs a bag of distinct tuples each with the original schema T: {({T}),({T}),({T})} -> ({T, T, T})
or if the maximum is reached, null: {({T}),({T}),({T}) ..} -> (null)
|
| Constructor and Description |
|---|
CountDistinctUpTo(java.lang.String maxAmount) |
| Modifier and Type | Method and Description |
|---|---|
void |
accumulate(org.apache.pig.data.Tuple tuple) |
void |
cleanup() |
java.lang.String |
getFinal() |
java.lang.String |
getInitial() |
java.lang.String |
getIntermed() |
java.lang.Integer |
getValue() |
org.apache.pig.impl.logicalLayer.schema.Schema |
outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input) |
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warnpublic void accumulate(org.apache.pig.data.Tuple tuple)
throws java.io.IOException
accumulate in interface org.apache.pig.Accumulator<java.lang.Integer>accumulate in class org.apache.pig.AccumulatorEvalFunc<java.lang.Integer>java.io.IOExceptionpublic void cleanup()
cleanup in interface org.apache.pig.Accumulator<java.lang.Integer>cleanup in class org.apache.pig.AccumulatorEvalFunc<java.lang.Integer>public java.lang.Integer getValue()
getValue in interface org.apache.pig.Accumulator<java.lang.Integer>getValue in class org.apache.pig.AccumulatorEvalFunc<java.lang.Integer>public java.lang.String getInitial()
getInitial in interface org.apache.pig.Algebraicpublic java.lang.String getIntermed()
getIntermed in interface org.apache.pig.Algebraicpublic java.lang.String getFinal()
getFinal in interface org.apache.pig.Algebraicpublic org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
outputSchema in class org.apache.pig.EvalFunc<java.lang.Integer>