public class Entropy
extends org.apache.pig.AccumulatorEvalFunc<java.lang.Double>
This UDF's constructor takes 2 arguments.
The 1st argument, the type of entropy estimator algorithm we currently support, includes:The default estimation algorithm is empirical.
The 2nd argument, the logarithm base we currently support, includes:
The default logarithm base is log.
Note:How to use:
This UDF calculates entropy from raw data tuples without the need to pre-compute per tuple occurrence frequency.
It could be used in a nested FOREACH after a GROUP BY, in which we sort the inner bag and use the sorted bag as this UDF's input.
Example:
--calculate empirical entropy with Euler's number as the logarithm base
define Entropy datafu.pig.stats.entropy.Entropy();
input = LOAD 'input' AS (grp: chararray, val: double);
-- calculate the input's entropy in each group
input_group_g = GROUP input BY grp;
entropy_group = FOREACH input_group_g {
input_val = input.val;
input_ordered = ORDER input_val BY $0;
GENERATE FLATTEN(group) AS group, Entropy(input_ordered) AS entropy;
}
CondEntropy,
EmpiricalCountEntropy| Constructor and Description |
|---|
Entropy() |
Entropy(java.lang.String type) |
Entropy(java.lang.String type,
java.lang.String base) |
| Modifier and Type | Method and Description |
|---|---|
void |
accumulate(org.apache.pig.data.Tuple input) |
void |
cleanup() |
java.lang.Double |
getValue() |
org.apache.pig.impl.logicalLayer.schema.Schema |
outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input) |
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warnpublic Entropy()
throws org.apache.pig.backend.executionengine.ExecException
org.apache.pig.backend.executionengine.ExecExceptionpublic Entropy(java.lang.String type)
throws org.apache.pig.backend.executionengine.ExecException
org.apache.pig.backend.executionengine.ExecExceptionpublic Entropy(java.lang.String type,
java.lang.String base)
throws org.apache.pig.backend.executionengine.ExecException
org.apache.pig.backend.executionengine.ExecExceptionpublic void accumulate(org.apache.pig.data.Tuple input)
throws java.io.IOException
accumulate in interface org.apache.pig.Accumulator<java.lang.Double>accumulate in class org.apache.pig.AccumulatorEvalFunc<java.lang.Double>java.io.IOExceptionpublic java.lang.Double getValue()
getValue in interface org.apache.pig.Accumulator<java.lang.Double>getValue in class org.apache.pig.AccumulatorEvalFunc<java.lang.Double>public void cleanup()
cleanup in interface org.apache.pig.Accumulator<java.lang.Double>cleanup in class org.apache.pig.AccumulatorEvalFunc<java.lang.Double>public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
outputSchema in class org.apache.pig.EvalFunc<java.lang.Double>