public class SessionCount
extends org.apache.pig.AccumulatorEvalFunc<java.lang.Long>
This is useful for tasks such as counting the number of page views per user since it: a) prevent reloads and go-backs from overcounting actual views b) captures the notion that views across multiple sessions are more meaningful
Input must be sorted ascendingly by time for this UDF to work.
Example:
%declare TIME_WINDOW 10m
define SessionCount datafu.pig.sessions.SessionCount('$TIME_WINDOW');
views = LOAD 'views' as (user_id:int, page_id:int, time:chararray);
views_grouped = GROUP views by (user_id, page_id);
view_counts = FOREACH views_grouped {
views = order views by time;
generate group.user_id as user_id,
group.page_id as page_id,
SessionCount(views.(time)) as count; }
Constructor and Description |
---|
SessionCount(java.lang.String timeSpec) |
Modifier and Type | Method and Description |
---|---|
void |
accumulate(org.apache.pig.data.Tuple input) |
void |
cleanup() |
java.lang.Long |
getValue() |
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, outputSchema, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
public void accumulate(org.apache.pig.data.Tuple input) throws java.io.IOException
accumulate
in interface org.apache.pig.Accumulator<java.lang.Long>
accumulate
in class org.apache.pig.AccumulatorEvalFunc<java.lang.Long>
java.io.IOException
public java.lang.Long getValue()
getValue
in interface org.apache.pig.Accumulator<java.lang.Long>
getValue
in class org.apache.pig.AccumulatorEvalFunc<java.lang.Long>
public void cleanup()
cleanup
in interface org.apache.pig.Accumulator<java.lang.Long>
cleanup
in class org.apache.pig.AccumulatorEvalFunc<java.lang.Long>