datafu.pig.sessions
Class SessionCount

java.lang.Object
  extended by org.apache.pig.EvalFunc<T>
      extended by org.apache.pig.AccumulatorEvalFunc<java.lang.Long>
          extended by datafu.pig.sessions.SessionCount
All Implemented Interfaces:
org.apache.pig.Accumulator<java.lang.Long>

public class SessionCount
extends org.apache.pig.AccumulatorEvalFunc<java.lang.Long>

Performs a count of events, ignoring events which occur within the same time window.

This is useful for tasks such as counting the number of page views per user since it: a) prevent reloads and go-backs from overcounting actual views b) captures the notion that views across multiple sessions are more meaningful

Input must be sorted ascendingly by time for this UDF to work.

Example:

 %declare TIME_WINDOW  10m
 
 define SessionCount datafu.pig.sessions.SessionCount('$TIME_WINDOW');
 
 views = LOAD 'views' as (user_id:int, page_id:int, time:chararray);
 views_grouped = GROUP views by (user_id, page_id);
 view_counts = FOREACH views_grouped { 
   views = order views by time;
   generate group.user_id as user_id, 
            group.page_id as page_id, 
            SessionCount(views.(time)) as count; }
 
 


Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
SessionCount(java.lang.String timeSpec)
           
 
Method Summary
 void accumulate(org.apache.pig.data.Tuple input)
           
 void cleanup()
           
 java.lang.Long getValue()
           
 
Methods inherited from class org.apache.pig.AccumulatorEvalFunc
exec
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, outputSchema, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SessionCount

public SessionCount(java.lang.String timeSpec)
Method Detail

accumulate

public void accumulate(org.apache.pig.data.Tuple input)
                throws java.io.IOException
Specified by:
accumulate in interface org.apache.pig.Accumulator<java.lang.Long>
Specified by:
accumulate in class org.apache.pig.AccumulatorEvalFunc<java.lang.Long>
Throws:
java.io.IOException

getValue

public java.lang.Long getValue()
Specified by:
getValue in interface org.apache.pig.Accumulator<java.lang.Long>
Specified by:
getValue in class org.apache.pig.AccumulatorEvalFunc<java.lang.Long>

cleanup

public void cleanup()
Specified by:
cleanup in interface org.apache.pig.Accumulator<java.lang.Long>
Specified by:
cleanup in class org.apache.pig.AccumulatorEvalFunc<java.lang.Long>


Matthew Hayes, Sam Shah