|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.conf.Configured datafu.hourglass.jobs.AbstractJob datafu.hourglass.jobs.TimeBasedJob datafu.hourglass.jobs.AbstractNonIncrementalJob
public abstract class AbstractNonIncrementalJob
Base class for Hadoop jobs that consume time-partitioned data
in a non-incremental way. Typically this is only used for comparing incremental
jobs against a non-incremental baseline.
It is essentially the same as AbstractPartitionCollapsingIncrementalJob
without all the incremental features.
Jobs extending this class consume input data partitioned according to yyyy/MM/dd. Only a single input path is supported. The output will be written to a directory in the output path with name format yyyyMMdd derived from the end of the time window that is consumed.
This class has the same configuration and methods as TimeBasedJob
.
In addition it also recognizes the following properties:
When combine.inputs is true, then CombinedAvroKeyInputFormat is used instead of AvroKeyInputFormat. This enables a single map task to consume more than one file.
The num.reducers.bytes.per.reducer property controls the number of reducers to use based on the input size. The total size of the input files is divided by this number and then rounded up.
Nested Class Summary | |
---|---|
static class |
AbstractNonIncrementalJob.BaseCombiner
Combiner base class for AbstractNonIncrementalJob . |
static class |
AbstractNonIncrementalJob.BaseMapper
Mapper base class for AbstractNonIncrementalJob . |
static class |
AbstractNonIncrementalJob.BaseReducer
Reducer base class for AbstractNonIncrementalJob . |
static class |
AbstractNonIncrementalJob.Report
Reports files created and processed for an iteration of the job. |
Constructor Summary | |
---|---|
AbstractNonIncrementalJob(java.lang.String name,
java.util.Properties props)
Initializes the job. |
Method Summary | |
---|---|
boolean |
getCombineInputs()
Gets whether inputs should be combined. |
java.lang.Class<? extends AbstractNonIncrementalJob.BaseCombiner> |
getCombinerClass()
Gets the combiner class. |
protected abstract org.apache.avro.Schema |
getMapOutputKeySchema()
Gets the key schema for the map output. |
protected abstract org.apache.avro.Schema |
getMapOutputValueSchema()
Gets the value schema for the map output. |
abstract java.lang.Class<? extends AbstractNonIncrementalJob.BaseMapper> |
getMapperClass()
Gets the mapper class. |
protected abstract org.apache.avro.Schema |
getReduceOutputSchema()
Gets the reduce output schema. |
abstract java.lang.Class<? extends AbstractNonIncrementalJob.BaseReducer> |
getReducerClass()
Gets the reducer class. |
AbstractNonIncrementalJob.Report |
getReport()
Gets a report summarizing the run. |
void |
run()
Runs the job. |
void |
setCombineInputs(boolean combineInputs)
Sets whether inputs should be combined. |
Methods inherited from class datafu.hourglass.jobs.TimeBasedJob |
---|
getDaysAgo, getEndDate, getNumDays, getStartDate, setDaysAgo, setEndDate, setNumDays, setProperties, setStartDate, validate |
Methods inherited from class datafu.hourglass.jobs.AbstractJob |
---|
config, createRandomTempPath, ensurePath, getCountersParentPath, getFileSystem, getInputPaths, getName, getNumReducers, getOutputPath, getProperties, getRetentionCount, getTempPath, initialize, isUseCombiner, randomTempPath, setCountersParentPath, setInputPaths, setName, setNumReducers, setOutputPath, setRetentionCount, setTempPath, setUseCombiner |
Methods inherited from class org.apache.hadoop.conf.Configured |
---|
getConf, setConf |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public AbstractNonIncrementalJob(java.lang.String name, java.util.Properties props) throws java.io.IOException
name
- job nameprops
- configuration properties
java.io.IOException
Method Detail |
---|
public boolean getCombineInputs()
public void setCombineInputs(boolean combineInputs)
combineInputs
- true to combine inputspublic AbstractNonIncrementalJob.Report getReport()
public void run() throws java.io.IOException, java.lang.InterruptedException, java.lang.ClassNotFoundException
run
in class AbstractJob
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
protected abstract org.apache.avro.Schema getMapOutputKeySchema()
protected abstract org.apache.avro.Schema getMapOutputValueSchema()
protected abstract org.apache.avro.Schema getReduceOutputSchema()
public abstract java.lang.Class<? extends AbstractNonIncrementalJob.BaseMapper> getMapperClass()
public abstract java.lang.Class<? extends AbstractNonIncrementalJob.BaseReducer> getReducerClass()
public java.lang.Class<? extends AbstractNonIncrementalJob.BaseCombiner> getCombinerClass()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |