public class Entropy
extends org.apache.pig.AccumulatorEvalFunc<java.lang.Double>
This UDF's constructor takes 2 arguments.
The 1st argument, the type of entropy estimator algorithm we currently support, includes:The default estimation algorithm is empirical.
The 2nd argument, the logarithm base we currently support, includes:
The default logarithm base is log.
Note:How to use:
This UDF calculates entropy from raw data tuples without the need to pre-compute per tuple occurrence frequency.
It could be used in a nested FOREACH after a GROUP BY, in which we sort the inner bag and use the sorted bag as this UDF's input.
Example:
--calculate empirical entropy with Euler's number as the logarithm base
define Entropy datafu.pig.stats.entropy.Entropy();
input = LOAD 'input' AS (grp: chararray, val: double);
-- calculate the input's entropy in each group
input_group_g = GROUP input BY grp;
entropy_group = FOREACH input_group_g {
input_val = input.val;
input_ordered = ORDER input_val BY $0;
GENERATE FLATTEN(group) AS group, Entropy(input_ordered) AS entropy;
}
CondEntropy
,
EmpiricalCountEntropy
Constructor and Description |
---|
Entropy() |
Entropy(java.lang.String type) |
Entropy(java.lang.String type,
java.lang.String base) |
Modifier and Type | Method and Description |
---|---|
void |
accumulate(org.apache.pig.data.Tuple input) |
void |
cleanup() |
java.lang.Double |
getValue() |
org.apache.pig.impl.logicalLayer.schema.Schema |
outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input) |
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
public Entropy() throws org.apache.pig.backend.executionengine.ExecException
org.apache.pig.backend.executionengine.ExecException
public Entropy(java.lang.String type) throws org.apache.pig.backend.executionengine.ExecException
org.apache.pig.backend.executionengine.ExecException
public Entropy(java.lang.String type, java.lang.String base) throws org.apache.pig.backend.executionengine.ExecException
org.apache.pig.backend.executionengine.ExecException
public void accumulate(org.apache.pig.data.Tuple input) throws java.io.IOException
accumulate
in interface org.apache.pig.Accumulator<java.lang.Double>
accumulate
in class org.apache.pig.AccumulatorEvalFunc<java.lang.Double>
java.io.IOException
public java.lang.Double getValue()
getValue
in interface org.apache.pig.Accumulator<java.lang.Double>
getValue
in class org.apache.pig.AccumulatorEvalFunc<java.lang.Double>
public void cleanup()
cleanup
in interface org.apache.pig.Accumulator<java.lang.Double>
cleanup
in class org.apache.pig.AccumulatorEvalFunc<java.lang.Double>
public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
outputSchema
in class org.apache.pig.EvalFunc<java.lang.Double>