datafu.pig.bags
Class ReverseEnumerate
java.lang.Object
org.apache.pig.EvalFunc<T>
datafu.pig.util.SimpleEvalFunc<org.apache.pig.data.DataBag>
datafu.pig.bags.ReverseEnumerate
public class ReverseEnumerate
- extends SimpleEvalFunc<org.apache.pig.data.DataBag>
Enumerate a bag, appending to each tuple its index within the bag, with indices being produced in
descending order.
For example:
{(A),(B),(C),(D)} => {(A,3),(B,2),(C,1),(D,0)}
The first constructor parameter (optional) dictates the starting index of the counting. As the
UDF requires the size of the bag for reverse counting, this UDF does not implement the
accumulator interface and suffers from the slight performance penalty of DataBag materialization.
Example:
define ReverseEnumerate datafu.pig.bags.ReverseEnumerate('1');
-- input:
-- ({(100),(200),(300),(400)})
input = LOAD 'input' as (B: bag{T: tuple(v2:INT)});
-- output:
-- ({(100,4),(200,3),(300,2),(400,1)})
output = FOREACH input GENERATE ReverseEnumerate(B);
Fields inherited from class org.apache.pig.EvalFunc |
log, pigLogger, reporter, returnType |
Method Summary |
org.apache.pig.data.DataBag |
call(org.apache.pig.data.DataBag inputBag)
|
org.apache.pig.impl.logicalLayer.schema.Schema |
outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
Override outputSchema so we can verify the input schema at pig compile time, instead of runtime |
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ReverseEnumerate
public ReverseEnumerate()
ReverseEnumerate
public ReverseEnumerate(java.lang.String start)
call
public org.apache.pig.data.DataBag call(org.apache.pig.data.DataBag inputBag)
throws java.io.IOException
- Throws:
java.io.IOException
outputSchema
public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
- Description copied from class:
SimpleEvalFunc
- Override outputSchema so we can verify the input schema at pig compile time, instead of runtime
- Overrides:
outputSchema
in class SimpleEvalFunc<org.apache.pig.data.DataBag>
- Parameters:
input
- input schema
- Returns:
- call to super.outputSchema in case schema was defined elsewhere
Matthew Hayes, Sam Shah