datafu.hourglass.mapreduce
Class PartitioningMapper

java.lang.Object
  extended by datafu.hourglass.mapreduce.ObjectProcessor
      extended by datafu.hourglass.mapreduce.ObjectMapper
          extended by datafu.hourglass.mapreduce.PartitioningMapper
All Implemented Interfaces:
java.io.Serializable

public class PartitioningMapper
extends ObjectMapper
implements java.io.Serializable

The mapper used by AbstractPartitionPreservingIncrementalJob and its derived classes.

An implementation of Mapper is used for the map operation, which produces key and intermediate value pairs from the input. The input to the mapper is assumed to be partitioned by day. Each key produced by Mapper is tagged with the time for the partition that the input came from. This enables the combiner and reducer to preserve the partitions.

Author:
"Matthew Hayes"
See Also:
Serialized Form

Constructor Summary
PartitioningMapper()
           
 
Method Summary
 Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getMapper()
          Gets the mapper.
 PartitionPreservingSchemas getSchemas()
          Gets the Avro schemas.
 void map(java.lang.Object inputObj, org.apache.hadoop.mapreduce.MapContext<java.lang.Object,java.lang.Object,java.lang.Object,java.lang.Object> context)
           
 void setContext(org.apache.hadoop.mapreduce.TaskInputOutputContext<java.lang.Object,java.lang.Object,java.lang.Object,java.lang.Object> context)
           
 void setMapper(Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> mapper)
          Sets the mapper.
 void setSchemas(PartitionPreservingSchemas schemas)
          Sets the Avro schemas.
 
Methods inherited from class datafu.hourglass.mapreduce.ObjectProcessor
close, getContext
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PartitioningMapper

public PartitioningMapper()
Method Detail

map

public void map(java.lang.Object inputObj,
                org.apache.hadoop.mapreduce.MapContext<java.lang.Object,java.lang.Object,java.lang.Object,java.lang.Object> context)
         throws java.io.IOException,
                java.lang.InterruptedException
Specified by:
map in class ObjectMapper
Throws:
java.io.IOException
java.lang.InterruptedException

getMapper

public Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getMapper()
Gets the mapper.

Returns:
mapper

setMapper

public void setMapper(Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> mapper)
Sets the mapper.

Parameters:
mapper -

setSchemas

public void setSchemas(PartitionPreservingSchemas schemas)
Sets the Avro schemas.

Parameters:
schemas -

getSchemas

public PartitionPreservingSchemas getSchemas()
Gets the Avro schemas.

Returns:
schemas

setContext

public void setContext(org.apache.hadoop.mapreduce.TaskInputOutputContext<java.lang.Object,java.lang.Object,java.lang.Object,java.lang.Object> context)
Overrides:
setContext in class ObjectProcessor


Matthew Hayes