@Nondeterministic public class WeightedReservoirSample extends ReservoirSample
Species with larger weight have higher probability to be selected in the final sample set.
This UDF inherits from ReservoirSample
and it is guaranteed to produce
a sample of the given size. Similarly it comes at the cost of scalability.
since it uses internal storage with size equaling the desired sample to guarantee the exact sample size.
define WeightedSample datafu.pig.sampling.WeightedReservoirSample('1','1');
input = LOAD 'input' AS (v1:chararray, v2:INT);
input_g = GROUP input ALL;
sampled = FOREACH input_g GENERATE WeightedSample(input);
Modifier and Type | Class and Description |
---|---|
static class |
WeightedReservoirSample.Final |
static class |
WeightedReservoirSample.Initial |
static class |
WeightedReservoirSample.Intermediate |
numSamples, scoreGen
Constructor and Description |
---|
WeightedReservoirSample(java.lang.String strNumSamples,
java.lang.String strWeightIdx) |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
getFinal() |
java.lang.String |
getInitial() |
java.lang.String |
getIntermed() |
protected datafu.pig.sampling.ScoredTuple.ScoreGenerator |
getScoreGenerator() |
org.apache.pig.impl.logicalLayer.schema.Schema |
outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input) |
accumulate, cleanup, exec, getValue
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
public WeightedReservoirSample(java.lang.String strNumSamples, java.lang.String strWeightIdx)
protected datafu.pig.sampling.ScoredTuple.ScoreGenerator getScoreGenerator()
getScoreGenerator
in class ReservoirSample
public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
outputSchema
in class ReservoirSample
public java.lang.String getInitial()
getInitial
in interface org.apache.pig.Algebraic
getInitial
in class ReservoirSample
public java.lang.String getIntermed()
getIntermed
in interface org.apache.pig.Algebraic
getIntermed
in class ReservoirSample
public java.lang.String getFinal()
getFinal
in interface org.apache.pig.Algebraic
getFinal
in class ReservoirSample