datafu.hourglass.jobs
Class PartitionCollapsingIncrementalJob

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by datafu.hourglass.jobs.AbstractJob
          extended by datafu.hourglass.jobs.TimeBasedJob
              extended by datafu.hourglass.jobs.IncrementalJob
                  extended by datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
                      extended by datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable

public class PartitionCollapsingIncrementalJob
extends AbstractPartitionCollapsingIncrementalJob

A concrete version of AbstractPartitionCollapsingIncrementalJob. This provides an alternative to extending AbstractPartitionCollapsingIncrementalJob. Instead of extending this class and implementing the abstract methods, this concrete version can be used instead. Getters and setters have been provided for the abstract methods.

Author:
"Matthew Hayes"

Nested Class Summary
 
Nested classes/interfaces inherited from class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
AbstractPartitionCollapsingIncrementalJob.Report
 
Field Summary
 
Fields inherited from class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
_reusePreviousOutput
 
Constructor Summary
PartitionCollapsingIncrementalJob(java.lang.Class cls)
          Initializes the job.
 
Method Summary
 void config(org.apache.hadoop.conf.Configuration conf)
          Overridden to provide custom configuration before the job starts.
 Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getCombinerAccumulator()
          Gets the accumulator used for the combiner.
protected  org.apache.avro.Schema getIntermediateValueSchema()
          Gets the Avro schema for the intermediate value.
protected  org.apache.avro.Schema getKeySchema()
          Gets the Avro schema for the key.
 Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getMapper()
          Gets the mapper.
 Merger<org.apache.avro.generic.GenericRecord> getOldRecordMerger()
          Gets the record merger that is capable of unmerging old partial output from the new output.
protected  org.apache.avro.Schema getOutputValueSchema()
          Gets the Avro schema for the output data.
 Merger<org.apache.avro.generic.GenericRecord> getRecordMerger()
          Gets the record merger that is capable of merging previous output with a new partial output.
 Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getReducerAccumulator()
          Gets the accumulator used for the reducer.
 void setCombinerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> combiner)
          Set the accumulator for the combiner
 void setIntermediateValueSchema(org.apache.avro.Schema intermediateValueSchema)
          Sets the Avro schema for the intermediate value.
 void setKeySchema(org.apache.avro.Schema keySchema)
          Sets the Avro schema for the key.
 void setMapper(Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> mapper)
          Set the mapper.
 void setMerger(Merger<org.apache.avro.generic.GenericRecord> merger)
          Sets the record merger that is capable of merging previous output with a new partial output.
 void setOldMerger(Merger<org.apache.avro.generic.GenericRecord> oldMerger)
          Sets the record merger that is capable of unmerging old partial output from the new output.
 void setOnSetup(Setup setup)
          Set callback to provide custom configuration before job begins execution.
 void setOutputValueSchema(org.apache.avro.Schema outputValueSchema)
          Sets the Avro schema for the output data.
 void setReducerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> reducer)
          Set the accumulator for the reducer.
 
Methods inherited from class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
getOutputSchemaName, getOutputSchemaNamespace, getReports, getReusePreviousOutput, initialize, run, setProperties, setReusePreviousOutput
 
Methods inherited from class datafu.hourglass.jobs.IncrementalJob
getMaxIterations, getMaxToProcess, getSchemas, isFailOnMissing, setFailOnMissing, setMaxIterations, setMaxToProcess
 
Methods inherited from class datafu.hourglass.jobs.TimeBasedJob
getDaysAgo, getEndDate, getNumDays, getStartDate, setDaysAgo, setEndDate, setNumDays, setStartDate, validate
 
Methods inherited from class datafu.hourglass.jobs.AbstractJob
createRandomTempPath, ensurePath, getCountersParentPath, getFileSystem, getInputPaths, getName, getNumReducers, getOutputPath, getProperties, getRetentionCount, getTempPath, isUseCombiner, randomTempPath, setCountersParentPath, setInputPaths, setName, setNumReducers, setOutputPath, setRetentionCount, setTempPath, setUseCombiner
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PartitionCollapsingIncrementalJob

public PartitionCollapsingIncrementalJob(java.lang.Class cls)
                                  throws java.io.IOException
Initializes the job. The job name is derived from the name of a provided class.

Parameters:
cls - class to base job name on
Throws:
java.io.IOException
Method Detail

getMapper

public Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getMapper()
Description copied from class: AbstractPartitionCollapsingIncrementalJob
Gets the mapper.

Specified by:
getMapper in class AbstractPartitionCollapsingIncrementalJob
Returns:
mapper

getCombinerAccumulator

public Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getCombinerAccumulator()
Description copied from class: AbstractPartitionCollapsingIncrementalJob
Gets the accumulator used for the combiner.

Overrides:
getCombinerAccumulator in class AbstractPartitionCollapsingIncrementalJob
Returns:
combiner accumulator

getReducerAccumulator

public Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getReducerAccumulator()
Description copied from class: AbstractPartitionCollapsingIncrementalJob
Gets the accumulator used for the reducer.

Specified by:
getReducerAccumulator in class AbstractPartitionCollapsingIncrementalJob
Returns:
reducer accumulator

getKeySchema

protected org.apache.avro.Schema getKeySchema()
Description copied from class: IncrementalJob
Gets the Avro schema for the key.

This is also used as the key for the map output.

Specified by:
getKeySchema in class IncrementalJob
Returns:
key schema.

getIntermediateValueSchema

protected org.apache.avro.Schema getIntermediateValueSchema()
Description copied from class: IncrementalJob
Gets the Avro schema for the intermediate value.

This is also used for the value for the map output.

Specified by:
getIntermediateValueSchema in class IncrementalJob
Returns:
intermediate value schema

getOutputValueSchema

protected org.apache.avro.Schema getOutputValueSchema()
Description copied from class: IncrementalJob
Gets the Avro schema for the output data.

Specified by:
getOutputValueSchema in class IncrementalJob
Returns:
output data schema

getRecordMerger

public Merger<org.apache.avro.generic.GenericRecord> getRecordMerger()
Description copied from class: AbstractPartitionCollapsingIncrementalJob
Gets the record merger that is capable of merging previous output with a new partial output. This is only needed when reusing previous output where the intermediate and output schemas are different. New partial output is produced by the reducer from new input that is after the previous output.

Overrides:
getRecordMerger in class AbstractPartitionCollapsingIncrementalJob
Returns:
merger

getOldRecordMerger

public Merger<org.apache.avro.generic.GenericRecord> getOldRecordMerger()
Description copied from class: AbstractPartitionCollapsingIncrementalJob
Gets the record merger that is capable of unmerging old partial output from the new output. This is only needed when reusing previous output for a fixed-length sliding window. The new output is the result of merging the previous output with the new partial output. The old partial output is produced by the reducer from old input data before the time range of the previous output.

Overrides:
getOldRecordMerger in class AbstractPartitionCollapsingIncrementalJob
Returns:
merger

setMapper

public void setMapper(Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> mapper)
Set the mapper.

Parameters:
mapper -

setCombinerAccumulator

public void setCombinerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> combiner)
Set the accumulator for the combiner

Parameters:
combiner - accumulator for the combiner

setReducerAccumulator

public void setReducerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> reducer)
Set the accumulator for the reducer.

Parameters:
reducer - accumulator for the reducer

setKeySchema

public void setKeySchema(org.apache.avro.Schema keySchema)
Sets the Avro schema for the key.

This is also used as the key for the map output.

Parameters:
keySchema - key schema

setIntermediateValueSchema

public void setIntermediateValueSchema(org.apache.avro.Schema intermediateValueSchema)
Sets the Avro schema for the intermediate value.

This is also used for the value for the map output.

Parameters:
intermediateValueSchema - intermediate value schema

setOutputValueSchema

public void setOutputValueSchema(org.apache.avro.Schema outputValueSchema)
Sets the Avro schema for the output data.

Parameters:
outputValueSchema - output value schema

setMerger

public void setMerger(Merger<org.apache.avro.generic.GenericRecord> merger)
Sets the record merger that is capable of merging previous output with a new partial output. This is only needed when reusing previous output where the intermediate and output schemas are different. New partial output is produced by the reducer from new input that is after the previous output.

Parameters:
merger -

setOldMerger

public void setOldMerger(Merger<org.apache.avro.generic.GenericRecord> oldMerger)
Sets the record merger that is capable of unmerging old partial output from the new output. This is only needed when reusing previous output for a fixed-length sliding window. The new output is the result of merging the previous output with the new partial output. The old partial output is produced by the reducer from old input data before the time range of the previous output.

Parameters:
oldMerger - merger

setOnSetup

public void setOnSetup(Setup setup)
Set callback to provide custom configuration before job begins execution.

Parameters:
setup - object with callback method

config

public void config(org.apache.hadoop.conf.Configuration conf)
Description copied from class: AbstractJob
Overridden to provide custom configuration before the job starts.

Overrides:
config in class AbstractJob


Matthew Hayes