public class PartitionPreservingExecutionPlanner extends ExecutionPlanner
AbstractPartitionPreservingIncrementalJob
and its derived classes.
This creates a plan to process partitioned input data and produce partitioned output data.
To use this class, the input and output paths must be specified. In addition the desired input date
range can be specified through several methods. Then createPlan()
can be called and the
execution plan will be created. The inputs to process will be available from getInputsToProcess()
,
the number of reducers to use will be available from getNumReducers()
, and the input schemas
will be available from getInputSchemas()
.
Configuration properties are used to configure a ReduceEstimator
instance. This is used to
calculate how many reducers should be used.
The number of reducers to use is based on the input data size and the
num.reducers.bytes.per.reducer property.
Check ReduceEstimator
for more details on how the properties are used.
Constructor and Description |
---|
PartitionPreservingExecutionPlanner(org.apache.hadoop.fs.FileSystem fs,
java.util.Properties props)
Initializes the execution planner.
|
Modifier and Type | Method and Description |
---|---|
void |
createPlan()
Create the execution plan.
|
java.util.List<java.util.Date> |
getDatesToProcess()
Gets the input dates which are to be processed.
|
java.util.List<org.apache.avro.Schema> |
getInputSchemas()
Gets the input schemas.
|
java.util.Map<java.lang.String,org.apache.avro.Schema> |
getInputSchemasByPath()
Gets a map from input path to schema.
|
java.util.List<DatePath> |
getInputsToProcess()
Gets the inputs which are to be processed.
|
boolean |
getNeedsAnotherPass()
Gets whether another pass will be required.
|
int |
getNumReducers()
Get the number of reducers to use based on the input data size.
|
determineAvailableInputDates, determineDateRange, getAvailableInputsByDate, getDailyData, getDatedData, getDateRange, getDaysAgo, getEndDate, getFileSystem, getInputPaths, getMaxToProcess, getNumDays, getOutputPath, getProps, getStartDate, isFailOnMissing, loadInputData, setDaysAgo, setEndDate, setFailOnMissing, setInputPaths, setMaxToProcess, setNumDays, setOutputPath, setStartDate
public PartitionPreservingExecutionPlanner(org.apache.hadoop.fs.FileSystem fs, java.util.Properties props)
fs
- file systemprops
- configuration propertiespublic void createPlan() throws java.io.IOException
java.io.IOException
- IOExceptionpublic int getNumReducers()
createPlan()
first.public java.util.List<org.apache.avro.Schema> getInputSchemas()
createPlan()
first.public java.util.Map<java.lang.String,org.apache.avro.Schema> getInputSchemasByPath()
createPlan()
first.public boolean getNeedsAnotherPass()
createPlan()
first.public java.util.List<DatePath> getInputsToProcess()
createPlan()
first.public java.util.List<java.util.Date> getDatesToProcess()
createPlan()
first.