PartitionCollapsingIncrementalJob (DataFu Hourglass)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

datafu.hourglass.jobs
Class PartitionCollapsingIncrementalJob

java.lang.Object
  org.apache.hadoop.conf.Configured
      datafu.hourglass.jobs.AbstractJob
          datafu.hourglass.jobs.TimeBasedJob
              datafu.hourglass.jobs.IncrementalJob
                  datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
                      datafu.hourglass.jobs.PartitionCollapsingIncrementalJob

All Implemented Interfaces:: org.apache.hadoop.conf.Configurable

public class PartitionCollapsingIncrementalJob
extends AbstractPartitionCollapsingIncrementalJob
extends AbstractPartitionCollapsingIncrementalJob

A concrete version of AbstractPartitionCollapsingIncrementalJob. This provides an alternative to extending AbstractPartitionCollapsingIncrementalJob. Instead of extending this class and implementing the abstract methods, this concrete version can be used instead. Getters and setters have been provided for the abstract methods.

Author:: "Matthew Hayes"

Nested Class Summary

Nested classes/interfaces inherited from class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
`AbstractPartitionCollapsingIncrementalJob.Report`

Field Summary

Fields inherited from class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
`_reusePreviousOutput`

Constructor Summary
`PartitionCollapsingIncrementalJob(java.lang.Class cls)` Initializes the job.

Method Summary
`void`	`config(org.apache.hadoop.conf.Configuration conf)` Overridden to provide custom configuration before the job starts.
`Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>`	`getCombinerAccumulator()` Gets the accumulator used for the combiner.
`protected org.apache.avro.Schema`	`getIntermediateValueSchema()` Gets the Avro schema for the intermediate value.
`protected org.apache.avro.Schema`	`getKeySchema()` Gets the Avro schema for the key.
`Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>`	`getMapper()` Gets the mapper.
`Merger<org.apache.avro.generic.GenericRecord>`	`getOldRecordMerger()` Gets the record merger that is capable of unmerging old partial output from the new output.
`protected org.apache.avro.Schema`	`getOutputValueSchema()` Gets the Avro schema for the output data.
`Merger<org.apache.avro.generic.GenericRecord>`	`getRecordMerger()` Gets the record merger that is capable of merging previous output with a new partial output.
`Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord>`	`getReducerAccumulator()` Gets the accumulator used for the reducer.
`void`	`setCombinerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> combiner)` Set the accumulator for the combiner
`void`	`setIntermediateValueSchema(org.apache.avro.Schema intermediateValueSchema)` Sets the Avro schema for the intermediate value.
`void`	`setKeySchema(org.apache.avro.Schema keySchema)` Sets the Avro schema for the key.
`void`	`setMapper(Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> mapper)` Set the mapper.
`void`	`setMerger(Merger<org.apache.avro.generic.GenericRecord> merger)` Sets the record merger that is capable of merging previous output with a new partial output.
`void`	`setOldMerger(Merger<org.apache.avro.generic.GenericRecord> oldMerger)` Sets the record merger that is capable of unmerging old partial output from the new output.
`void`	`setOnSetup(Setup setup)` Set callback to provide custom configuration before job begins execution.
`void`	`setOutputValueSchema(org.apache.avro.Schema outputValueSchema)` Sets the Avro schema for the output data.
`void`	`setReducerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> reducer)` Set the accumulator for the reducer.

Methods inherited from class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
`getOutputSchemaName, getOutputSchemaNamespace, getReports, getReusePreviousOutput, initialize, run, setProperties, setReusePreviousOutput`

Methods inherited from class datafu.hourglass.jobs.IncrementalJob
`getMaxIterations, getMaxToProcess, getSchemas, isFailOnMissing, setFailOnMissing, setMaxIterations, setMaxToProcess`

Methods inherited from class datafu.hourglass.jobs.TimeBasedJob
`getDaysAgo, getEndDate, getNumDays, getStartDate, setDaysAgo, setEndDate, setNumDays, setStartDate, validate`

Methods inherited from class datafu.hourglass.jobs.AbstractJob
`createRandomTempPath, ensurePath, getCountersParentPath, getFileSystem, getInputPaths, getName, getNumReducers, getOutputPath, getProperties, getRetentionCount, getTempPath, isUseCombiner, randomTempPath, setCountersParentPath, setInputPaths, setName, setNumReducers, setOutputPath, setRetentionCount, setTempPath, setUseCombiner`

Methods inherited from class org.apache.hadoop.conf.Configured
`getConf, setConf`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

PartitionCollapsingIncrementalJob

public PartitionCollapsingIncrementalJob(java.lang.Class cls)
                                  throws java.io.IOException

Initializes the job. The job name is derived from the name of a provided class.

Parameters:: cls - class to base job name on
Throws:: java.io.IOException

Method Detail

getMapper

public Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getMapper()

Description copied from class: AbstractPartitionCollapsingIncrementalJob

Gets the mapper.

Specified by:: getMapper in class AbstractPartitionCollapsingIncrementalJob

Returns:: mapper

getCombinerAccumulator

public Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getCombinerAccumulator()

Description copied from class: AbstractPartitionCollapsingIncrementalJob

Gets the accumulator used for the combiner.

Overrides:: getCombinerAccumulator in class AbstractPartitionCollapsingIncrementalJob

Returns:: combiner accumulator

getReducerAccumulator

public Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> getReducerAccumulator()

Description copied from class: AbstractPartitionCollapsingIncrementalJob

Gets the accumulator used for the reducer.

Specified by:: getReducerAccumulator in class AbstractPartitionCollapsingIncrementalJob

Returns:: reducer accumulator

getKeySchema

protected org.apache.avro.Schema getKeySchema()

Description copied from class: IncrementalJob

Gets the Avro schema for the key.

This is also used as the key for the map output.

Specified by:: getKeySchema in class IncrementalJob

Returns:: key schema.

getIntermediateValueSchema

protected org.apache.avro.Schema getIntermediateValueSchema()

Description copied from class: IncrementalJob

Gets the Avro schema for the intermediate value.

This is also used for the value for the map output.

Specified by:: getIntermediateValueSchema in class IncrementalJob

Returns:: intermediate value schema

getOutputValueSchema

protected org.apache.avro.Schema getOutputValueSchema()

Description copied from class: IncrementalJob

Gets the Avro schema for the output data.

Specified by:: getOutputValueSchema in class IncrementalJob

Returns:: output data schema

getRecordMerger

public Merger<org.apache.avro.generic.GenericRecord> getRecordMerger()

Description copied from class: AbstractPartitionCollapsingIncrementalJob

Gets the record merger that is capable of merging previous output with a new partial output. This is only needed when reusing previous output where the intermediate and output schemas are different. New partial output is produced by the reducer from new input that is after the previous output.

Overrides:: getRecordMerger in class AbstractPartitionCollapsingIncrementalJob

Returns:: merger

getOldRecordMerger

public Merger<org.apache.avro.generic.GenericRecord> getOldRecordMerger()

Description copied from class: AbstractPartitionCollapsingIncrementalJob

Gets the record merger that is capable of unmerging old partial output from the new output. This is only needed when reusing previous output for a fixed-length sliding window. The new output is the result of merging the previous output with the new partial output. The old partial output is produced by the reducer from old input data before the time range of the previous output.

Overrides:: getOldRecordMerger in class AbstractPartitionCollapsingIncrementalJob

Returns:: merger

setMapper

public void setMapper(Mapper<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> mapper)

Set the mapper.

Parameters:: mapper -

setCombinerAccumulator

public void setCombinerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> combiner)

Set the accumulator for the combiner

Parameters:: combiner - accumulator for the combiner

setReducerAccumulator

public void setReducerAccumulator(Accumulator<org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord> reducer)

Set the accumulator for the reducer.

Parameters:: reducer - accumulator for the reducer

setKeySchema

public void setKeySchema(org.apache.avro.Schema keySchema)

Sets the Avro schema for the key.

This is also used as the key for the map output.

Parameters:: keySchema - key schema

setIntermediateValueSchema

public void setIntermediateValueSchema(org.apache.avro.Schema intermediateValueSchema)

Sets the Avro schema for the intermediate value.

This is also used for the value for the map output.

Parameters:: intermediateValueSchema - intermediate value schema

setOutputValueSchema

public void setOutputValueSchema(org.apache.avro.Schema outputValueSchema)

Sets the Avro schema for the output data.

Parameters:: outputValueSchema - output value schema

setMerger

public void setMerger(Merger<org.apache.avro.generic.GenericRecord> merger)

Sets the record merger that is capable of merging previous output with a new partial output. This is only needed when reusing previous output where the intermediate and output schemas are different. New partial output is produced by the reducer from new input that is after the previous output.

Parameters:: merger -

setOldMerger

public void setOldMerger(Merger<org.apache.avro.generic.GenericRecord> oldMerger)

Sets the record merger that is capable of unmerging old partial output from the new output. This is only needed when reusing previous output for a fixed-length sliding window. The new output is the result of merging the previous output with the new partial output. The old partial output is produced by the reducer from old input data before the time range of the previous output.

Parameters:: oldMerger - merger

setOnSetup

public void setOnSetup(Setup setup)

Set callback to provide custom configuration before job begins execution.

Parameters:: setup - object with callback method

config

public void config(org.apache.hadoop.conf.Configuration conf)

Description copied from class: AbstractJob

Overridden to provide custom configuration before the job starts.

Overrides:: config in class AbstractJob

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Matthew Hayes

datafu.hourglass.jobs Class PartitionCollapsingIncrementalJob

PartitionCollapsingIncrementalJob

getMapper

getCombinerAccumulator

getReducerAccumulator

getKeySchema

getIntermediateValueSchema

getOutputValueSchema

getRecordMerger

getOldRecordMerger

setMapper

setCombinerAccumulator

setReducerAccumulator

setKeySchema

setIntermediateValueSchema

setOutputValueSchema

setMerger

setOldMerger

setOnSetup

config

datafu.hourglass.jobs
Class PartitionCollapsingIncrementalJob