datafu.hourglass.mapreduce
Class PartitioningReducer

java.lang.Object
  extended by datafu.hourglass.mapreduce.ObjectProcessor
      extended by datafu.hourglass.mapreduce.ObjectReducer
          extended by datafu.hourglass.mapreduce.PartitioningReducer
All Implemented Interfaces:
java.io.Serializable

public class PartitioningReducer
extends ObjectReducer
implements java.io.Serializable

The reducer used by AbstractPartitionPreservingIncrementalJob and its derived classes.

An implementation of Accumulator is used to perform aggregation and produce the output value.

The input key is assumed to have time and value fields. The value here is the true key, and the time represents the input partition the data was derived from. The true key is used as the key in the reducer output and the time is dropped. This reducer uses multiple outputs; the time is used to determine which output to write to, where the named outputs have the form yyyyMMdd derived from the time.

Author:
"Matthew Hayes"
See Also:
Serialized Form

Constructor Summary
PartitioningReducer()
           
 
Method Summary
 void close()
           
 Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getAccumulator()
          Gets the accumulator used to perform aggregation.
 PartitionPreservingSchemas getSchemas()
          Gets the Avro schemas
 void reduce(java.lang.Object keyObj, java.lang.Iterable<java.lang.Object> values, org.apache.hadoop.mapreduce.ReduceContext<java.lang.Object,java.lang.Object,java.lang.Object,java.lang.Object> context)
           
 void setAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> acc)
          Sets the accumulator used to perform aggregation.
 void setContext(org.apache.hadoop.mapreduce.TaskInputOutputContext<java.lang.Object,java.lang.Object,java.lang.Object,java.lang.Object> context)
           
 void setSchemas(PartitionPreservingSchemas schemas)
          Sets the Avro schemas.
 
Methods inherited from class datafu.hourglass.mapreduce.ObjectProcessor
getContext
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PartitioningReducer

public PartitioningReducer()
Method Detail

reduce

public void reduce(java.lang.Object keyObj,
                   java.lang.Iterable<java.lang.Object> values,
                   org.apache.hadoop.mapreduce.ReduceContext<java.lang.Object,java.lang.Object,java.lang.Object,java.lang.Object> context)
            throws java.io.IOException,
                   java.lang.InterruptedException
Specified by:
reduce in class ObjectReducer
Throws:
java.io.IOException
java.lang.InterruptedException

setContext

public void setContext(org.apache.hadoop.mapreduce.TaskInputOutputContext<java.lang.Object,java.lang.Object,java.lang.Object,java.lang.Object> context)
Overrides:
setContext in class ObjectProcessor

setAccumulator

public void setAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> acc)
Sets the accumulator used to perform aggregation.

Parameters:
acc - The accumulator

getAccumulator

public Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getAccumulator()
Gets the accumulator used to perform aggregation.

Returns:
The accumulator

setSchemas

public void setSchemas(PartitionPreservingSchemas schemas)
Sets the Avro schemas.

Parameters:
schemas -

getSchemas

public PartitionPreservingSchemas getSchemas()
Gets the Avro schemas

Returns:
schemas

close

public void close()
           throws java.io.IOException,
                  java.lang.InterruptedException
Overrides:
close in class ObjectProcessor
Throws:
java.io.IOException
java.lang.InterruptedException


Matthew Hayes