datafu.hourglass.jobs
Class PartitionPreservingIncrementalJob

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by datafu.hourglass.jobs.AbstractJob
          extended by datafu.hourglass.jobs.TimeBasedJob
              extended by datafu.hourglass.jobs.IncrementalJob
                  extended by datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
                      extended by datafu.hourglass.jobs.PartitionPreservingIncrementalJob
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable

public class PartitionPreservingIncrementalJob
extends AbstractPartitionPreservingIncrementalJob

A concrete version of AbstractPartitionPreservingIncrementalJob. This provides an alternative to extending AbstractPartitionPreservingIncrementalJob. Instead of extending this class and implementing the abstract methods, this concrete version can be used instead. Getters and setters have been provided for the abstract methods.

Author:
"Matthew Hayes"

Nested Class Summary
 
Nested classes/interfaces inherited from class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
AbstractPartitionPreservingIncrementalJob.Report
 
Constructor Summary
PartitionPreservingIncrementalJob(java.lang.Class cls)
          Initializes the job.
 
Method Summary
 void config(org.apache.hadoop.conf.Configuration conf)
          Overridden to provide custom configuration before the job starts.
 Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getCombinerAccumulator()
          Gets the accumulator used for the combiner.
protected  org.apache.avro.Schema getIntermediateValueSchema()
          Gets the Avro schema for the intermediate value.
protected  org.apache.avro.Schema getKeySchema()
          Gets the Avro schema for the key.
 Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getMapper()
          Gets the mapper.
protected  org.apache.avro.Schema getOutputValueSchema()
          Gets the Avro schema for the output data.
 Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getReducerAccumulator()
          Gets the accumulator used for the reducer.
 void setCombinerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> combiner)
          Set the accumulator for the combiner
 void setIntermediateValueSchema(org.apache.avro.Schema intermediateValueSchema)
          Sets the Avro schema for the intermediate value.
 void setKeySchema(org.apache.avro.Schema keySchema)
          Sets the Avro schema for the key.
 void setMapper(Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> mapper)
          Set the mapper.
 void setOnSetup(Setup setup)
          Set callback to provide custom configuration before job begins execution.
 void setOutputValueSchema(org.apache.avro.Schema outputValueSchema)
          Sets the Avro schema for the output data.
 void setReducerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> reducer)
          Set the accumulator for the reducer.
 
Methods inherited from class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
getCombineProcessor, getMapProcessor, getOutputSchemaName, getOutputSchemaNamespace, getReduceProcessor, getReports, initialize, run
 
Methods inherited from class datafu.hourglass.jobs.IncrementalJob
getMaxIterations, getMaxToProcess, getSchemas, isFailOnMissing, setFailOnMissing, setMaxIterations, setMaxToProcess, setProperties
 
Methods inherited from class datafu.hourglass.jobs.TimeBasedJob
getDaysAgo, getEndDate, getNumDays, getStartDate, setDaysAgo, setEndDate, setNumDays, setStartDate, validate
 
Methods inherited from class datafu.hourglass.jobs.AbstractJob
createRandomTempPath, ensurePath, getCountersParentPath, getFileSystem, getInputPaths, getName, getNumReducers, getOutputPath, getProperties, getRetentionCount, getTempPath, isUseCombiner, randomTempPath, setCountersParentPath, setInputPaths, setName, setNumReducers, setOutputPath, setRetentionCount, setTempPath, setUseCombiner
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PartitionPreservingIncrementalJob

public PartitionPreservingIncrementalJob(java.lang.Class cls)
                                  throws java.io.IOException
Initializes the job. The job name is derived from the name of a provided class.

Parameters:
cls - class to base job name on
Throws:
java.io.IOException
Method Detail

getMapper

public Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getMapper()
Description copied from class: AbstractPartitionPreservingIncrementalJob
Gets the mapper.

Specified by:
getMapper in class AbstractPartitionPreservingIncrementalJob
Returns:
mapper

getCombinerAccumulator

public Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getCombinerAccumulator()
Description copied from class: AbstractPartitionPreservingIncrementalJob
Gets the accumulator used for the combiner.

Overrides:
getCombinerAccumulator in class AbstractPartitionPreservingIncrementalJob
Returns:
combiner accumulator

getReducerAccumulator

public Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getReducerAccumulator()
Description copied from class: AbstractPartitionPreservingIncrementalJob
Gets the accumulator used for the reducer.

Specified by:
getReducerAccumulator in class AbstractPartitionPreservingIncrementalJob
Returns:
reducer accumulator

getKeySchema

protected org.apache.avro.Schema getKeySchema()
Description copied from class: IncrementalJob
Gets the Avro schema for the key.

This is also used as the key for the map output.

Specified by:
getKeySchema in class IncrementalJob
Returns:
key schema.

getIntermediateValueSchema

protected org.apache.avro.Schema getIntermediateValueSchema()
Description copied from class: IncrementalJob
Gets the Avro schema for the intermediate value.

This is also used for the value for the map output.

Specified by:
getIntermediateValueSchema in class IncrementalJob
Returns:
intermediate value schema

getOutputValueSchema

protected org.apache.avro.Schema getOutputValueSchema()
Description copied from class: IncrementalJob
Gets the Avro schema for the output data.

Specified by:
getOutputValueSchema in class IncrementalJob
Returns:
output data schema

setMapper

public void setMapper(Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> mapper)
Set the mapper.

Parameters:
mapper -

setCombinerAccumulator

public void setCombinerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> combiner)
Set the accumulator for the combiner

Parameters:
combiner - accumulator for the combiner

setReducerAccumulator

public void setReducerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> reducer)
Set the accumulator for the reducer.

Parameters:
reducer - accumulator for the reducer

setKeySchema

public void setKeySchema(org.apache.avro.Schema keySchema)
Sets the Avro schema for the key.

This is also used as the key for the map output.

Parameters:
keySchema - key schema

setIntermediateValueSchema

public void setIntermediateValueSchema(org.apache.avro.Schema intermediateValueSchema)
Sets the Avro schema for the intermediate value.

This is also used for the value for the map output.

Parameters:
intermediateValueSchema - intermediate value schema

setOutputValueSchema

public void setOutputValueSchema(org.apache.avro.Schema outputValueSchema)
Sets the Avro schema for the output data.

Parameters:
outputValueSchema - output value schema

setOnSetup

public void setOnSetup(Setup setup)
Set callback to provide custom configuration before job begins execution.

Parameters:
setup - object with callback method

config

public void config(org.apache.hadoop.conf.Configuration conf)
Description copied from class: AbstractJob
Overridden to provide custom configuration before the job starts.

Overrides:
config in class AbstractJob


Matthew Hayes