@Nondeterministic public class WeightedReservoirSample extends ReservoirSample
Species with larger weight have higher probability to be selected in the final sample set.
This UDF inherits from ReservoirSample and it is guaranteed to produce
a sample of the given size. Similarly it comes at the cost of scalability.
since it uses internal storage with size equaling the desired sample to guarantee the exact sample size.
define WeightedSample datafu.pig.sampling.WeightedReservoirSample('1','1');
input = LOAD 'input' AS (v1:chararray, v2:INT);
input_g = GROUP input ALL;
sampled = FOREACH input_g GENERATE WeightedSample(input);
| Modifier and Type | Class and Description |
|---|---|
static class |
WeightedReservoirSample.Final |
static class |
WeightedReservoirSample.Initial |
static class |
WeightedReservoirSample.Intermediate |
numSamples, scoreGen| Constructor and Description |
|---|
WeightedReservoirSample(java.lang.String strNumSamples,
java.lang.String strWeightIdx) |
| Modifier and Type | Method and Description |
|---|---|
java.lang.String |
getFinal() |
java.lang.String |
getInitial() |
java.lang.String |
getIntermed() |
protected datafu.pig.sampling.ScoredTuple.ScoreGenerator |
getScoreGenerator() |
org.apache.pig.impl.logicalLayer.schema.Schema |
outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input) |
accumulate, cleanup, exec, getValueallowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warnpublic WeightedReservoirSample(java.lang.String strNumSamples,
java.lang.String strWeightIdx)
protected datafu.pig.sampling.ScoredTuple.ScoreGenerator getScoreGenerator()
getScoreGenerator in class ReservoirSamplepublic org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
outputSchema in class ReservoirSamplepublic java.lang.String getInitial()
getInitial in interface org.apache.pig.AlgebraicgetInitial in class ReservoirSamplepublic java.lang.String getIntermed()
getIntermed in interface org.apache.pig.AlgebraicgetIntermed in class ReservoirSamplepublic java.lang.String getFinal()
getFinal in interface org.apache.pig.AlgebraicgetFinal in class ReservoirSample