PartitionPreservingIncrementalJob (DataFu Hourglass)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

datafu.hourglass.jobs
Class PartitionPreservingIncrementalJob

java.lang.Object
  org.apache.hadoop.conf.Configured
      datafu.hourglass.jobs.AbstractJob
          datafu.hourglass.jobs.TimeBasedJob
              datafu.hourglass.jobs.IncrementalJob
                  datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
                      datafu.hourglass.jobs.PartitionPreservingIncrementalJob

All Implemented Interfaces:: org.apache.hadoop.conf.Configurable

public class PartitionPreservingIncrementalJob
extends AbstractPartitionPreservingIncrementalJob
extends AbstractPartitionPreservingIncrementalJob

A concrete version of AbstractPartitionPreservingIncrementalJob. This provides an alternative to extending AbstractPartitionPreservingIncrementalJob. Instead of extending this class and implementing the abstract methods, this concrete version can be used instead. Getters and setters have been provided for the abstract methods.

Author:: "Matthew Hayes"

Nested Class Summary

Nested classes/interfaces inherited from class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
`AbstractPartitionPreservingIncrementalJob.Report`

Constructor Summary
`PartitionPreservingIncrementalJob(java.lang.Class cls)` Initializes the job.

Method Summary
`void`	`config(org.apache.hadoop.conf.Configuration conf)` Overridden to provide custom configuration before the job starts.
`Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>`	`getCombinerAccumulator()` Gets the accumulator used for the combiner.
`protected org.apache.avro.Schema`	`getIntermediateValueSchema()` Gets the Avro schema for the intermediate value.
`protected org.apache.avro.Schema`	`getKeySchema()` Gets the Avro schema for the key.
`Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>`	`getMapper()` Gets the mapper.
`protected org.apache.avro.Schema`	`getOutputValueSchema()` Gets the Avro schema for the output data.
`Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>`	`getReducerAccumulator()` Gets the accumulator used for the reducer.
`void`	`setCombinerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> combiner)` Set the accumulator for the combiner
`void`	`setIntermediateValueSchema(org.apache.avro.Schema intermediateValueSchema)` Sets the Avro schema for the intermediate value.
`void`	`setKeySchema(org.apache.avro.Schema keySchema)` Sets the Avro schema for the key.
`void`	`setMapper(Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> mapper)` Set the mapper.
`void`	`setOnSetup(Setup setup)` Set callback to provide custom configuration before job begins execution.
`void`	`setOutputValueSchema(org.apache.avro.Schema outputValueSchema)` Sets the Avro schema for the output data.
`void`	`setReducerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> reducer)` Set the accumulator for the reducer.

Methods inherited from class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
`getCombineProcessor, getMapProcessor, getOutputSchemaName, getOutputSchemaNamespace, getReduceProcessor, getReports, initialize, run`

Methods inherited from class datafu.hourglass.jobs.IncrementalJob
`getMaxIterations, getMaxToProcess, getSchemas, isFailOnMissing, setFailOnMissing, setMaxIterations, setMaxToProcess, setProperties`

Methods inherited from class datafu.hourglass.jobs.TimeBasedJob
`getDaysAgo, getEndDate, getNumDays, getStartDate, setDaysAgo, setEndDate, setNumDays, setStartDate, validate`

Methods inherited from class datafu.hourglass.jobs.AbstractJob
`createRandomTempPath, ensurePath, getCountersParentPath, getFileSystem, getInputPaths, getName, getNumReducers, getOutputPath, getProperties, getRetentionCount, getTempPath, isUseCombiner, randomTempPath, setCountersParentPath, setInputPaths, setName, setNumReducers, setOutputPath, setRetentionCount, setTempPath, setUseCombiner`

Methods inherited from class org.apache.hadoop.conf.Configured
`getConf, setConf`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

PartitionPreservingIncrementalJob

public PartitionPreservingIncrementalJob(java.lang.Class cls)
                                  throws java.io.IOException

Initializes the job. The job name is derived from the name of a provided class.

Parameters:: cls - class to base job name on
Throws:: java.io.IOException

Method Detail

getMapper

public Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getMapper()

Description copied from class: AbstractPartitionPreservingIncrementalJob

Gets the mapper.

Specified by:: getMapper in class AbstractPartitionPreservingIncrementalJob

Returns:: mapper

getCombinerAccumulator

public Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getCombinerAccumulator()

Description copied from class: AbstractPartitionPreservingIncrementalJob

Gets the accumulator used for the combiner.

Overrides:: getCombinerAccumulator in class AbstractPartitionPreservingIncrementalJob

Returns:: combiner accumulator

getReducerAccumulator

public Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getReducerAccumulator()

Description copied from class: AbstractPartitionPreservingIncrementalJob

Gets the accumulator used for the reducer.

Specified by:: getReducerAccumulator in class AbstractPartitionPreservingIncrementalJob

Returns:: reducer accumulator

getKeySchema

protected org.apache.avro.Schema getKeySchema()

Description copied from class: IncrementalJob

Gets the Avro schema for the key.

This is also used as the key for the map output.

Specified by:: getKeySchema in class IncrementalJob

Returns:: key schema.

getIntermediateValueSchema

protected org.apache.avro.Schema getIntermediateValueSchema()

Description copied from class: IncrementalJob

Gets the Avro schema for the intermediate value.

This is also used for the value for the map output.

Specified by:: getIntermediateValueSchema in class IncrementalJob

Returns:: intermediate value schema

getOutputValueSchema

protected org.apache.avro.Schema getOutputValueSchema()

Description copied from class: IncrementalJob

Gets the Avro schema for the output data.

Specified by:: getOutputValueSchema in class IncrementalJob

Returns:: output data schema

setMapper

public void setMapper(Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> mapper)

Set the mapper.

Parameters:: mapper -

setCombinerAccumulator

public void setCombinerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> combiner)

Set the accumulator for the combiner

Parameters:: combiner - accumulator for the combiner

setReducerAccumulator

public void setReducerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> reducer)

Set the accumulator for the reducer.

Parameters:: reducer - accumulator for the reducer

setKeySchema

public void setKeySchema(org.apache.avro.Schema keySchema)

Sets the Avro schema for the key.

This is also used as the key for the map output.

Parameters:: keySchema - key schema

setIntermediateValueSchema

public void setIntermediateValueSchema(org.apache.avro.Schema intermediateValueSchema)

Sets the Avro schema for the intermediate value.

This is also used for the value for the map output.

Parameters:: intermediateValueSchema - intermediate value schema

setOutputValueSchema

public void setOutputValueSchema(org.apache.avro.Schema outputValueSchema)

Sets the Avro schema for the output data.

Parameters:: outputValueSchema - output value schema

setOnSetup

public void setOnSetup(Setup setup)

Set callback to provide custom configuration before job begins execution.

Parameters:: setup - object with callback method

config

public void config(org.apache.hadoop.conf.Configuration conf)

Description copied from class: AbstractJob

Overridden to provide custom configuration before the job starts.

Overrides:: config in class AbstractJob

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Matthew Hayes

datafu.hourglass.jobs Class PartitionPreservingIncrementalJob

PartitionPreservingIncrementalJob

getMapper

getCombinerAccumulator

getReducerAccumulator

getKeySchema

getIntermediateValueSchema

getOutputValueSchema

setMapper

setCombinerAccumulator

setReducerAccumulator

setKeySchema

setIntermediateValueSchema

setOutputValueSchema

setOnSetup

config

datafu.hourglass.jobs
Class PartitionPreservingIncrementalJob