datafu.hourglass.jobs
Class ExecutionPlanner

java.lang.Object
  extended by datafu.hourglass.jobs.ExecutionPlanner
Direct Known Subclasses:
PartitionCollapsingExecutionPlanner, PartitionPreservingExecutionPlanner

public abstract class ExecutionPlanner
extends java.lang.Object

Base class for execution planners. An execution planner determines which files should be processed for a particular run.

Author:
"Matthew Hayes"

Constructor Summary
ExecutionPlanner(org.apache.hadoop.fs.FileSystem fs, java.util.Properties props)
          Initializes the execution planner.
 
Method Summary
protected  void determineAvailableInputDates()
          Determines what input data is available.
protected  void determineDateRange()
          Determine the date range for inputs to process based on the configuration and available inputs.
protected  java.util.Map<java.util.Date,java.util.List<DatePath>> getAvailableInputsByDate()
          Gets a map from date to available input data.
protected  java.util.SortedMap<java.util.Date,DatePath> getDailyData(org.apache.hadoop.fs.Path path)
          Get a map from date to path for all paths matching yyyy/MM/dd under the given path.
protected  java.util.SortedMap<java.util.Date,DatePath> getDatedData(org.apache.hadoop.fs.Path path)
          Get a map from date to path for all paths matching yyyyMMdd under the given path.
 DateRange getDateRange()
          Gets the desired input date range to process based on the configuration and available inputs.
 java.lang.Integer getDaysAgo()
          Gets the number of days to subtract off the end date.
 java.util.Date getEndDate()
          Gets the end date
protected  org.apache.hadoop.fs.FileSystem getFileSystem()
          Gets the file system.
 java.util.List<org.apache.hadoop.fs.Path> getInputPaths()
          Gets the input paths.
 java.lang.Integer getMaxToProcess()
          Gets the maximum number of days to process at a time.
 java.lang.Integer getNumDays()
          Gets the number of days to process.
 org.apache.hadoop.fs.Path getOutputPath()
          Gets the output path.
protected  java.util.Properties getProps()
          Gets the configuration properties.
 java.util.Date getStartDate()
          Gets the start date
 boolean isFailOnMissing()
          Gets whether the job should fail if data is missing within the desired date range.
protected  void loadInputData()
          Determine what input data is available.
 void setDaysAgo(java.lang.Integer daysAgo)
          Sets the number of days to subtract off the end date.
 void setEndDate(java.util.Date endDate)
          Sets the end date.
 void setFailOnMissing(boolean failOnMissing)
          Sets whether the job should fail if data is missing within the desired date range.
 void setInputPaths(java.util.List<org.apache.hadoop.fs.Path> inputPaths)
          Sets the input paths.
 void setMaxToProcess(java.lang.Integer maxToProcess)
          Sets the maximum number of days to process at a time.
 void setNumDays(java.lang.Integer numDays)
          Sets the number of days to process.
 void setOutputPath(org.apache.hadoop.fs.Path outputPath)
          Sets the output path.
 void setStartDate(java.util.Date startDate)
          Sets the start date.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ExecutionPlanner

public ExecutionPlanner(org.apache.hadoop.fs.FileSystem fs,
                        java.util.Properties props)
Initializes the execution planner.

Parameters:
fs - file system to use
props - configuration properties
Method Detail

getOutputPath

public org.apache.hadoop.fs.Path getOutputPath()
Gets the output path.

Returns:
output path

getInputPaths

public java.util.List<org.apache.hadoop.fs.Path> getInputPaths()
Gets the input paths.

Returns:
input paths

setOutputPath

public void setOutputPath(org.apache.hadoop.fs.Path outputPath)
Sets the output path.

Parameters:
outputPath - output path

setInputPaths

public void setInputPaths(java.util.List<org.apache.hadoop.fs.Path> inputPaths)
Sets the input paths.

Parameters:
inputPaths - input paths

setStartDate

public void setStartDate(java.util.Date startDate)
Sets the start date.

Parameters:
startDate - start date

getStartDate

public java.util.Date getStartDate()
Gets the start date

Returns:
start date

setEndDate

public void setEndDate(java.util.Date endDate)
Sets the end date.

Parameters:
endDate - end date

getEndDate

public java.util.Date getEndDate()
Gets the end date

Returns:
end date

setDaysAgo

public void setDaysAgo(java.lang.Integer daysAgo)
Sets the number of days to subtract off the end date.

Parameters:
daysAgo - days ago

getDaysAgo

public java.lang.Integer getDaysAgo()
Gets the number of days to subtract off the end date.

Returns:
days ago

setNumDays

public void setNumDays(java.lang.Integer numDays)
Sets the number of days to process.

Parameters:
numDays - number of days to process

getNumDays

public java.lang.Integer getNumDays()
Gets the number of days to process.

Returns:
number of days to process

setMaxToProcess

public void setMaxToProcess(java.lang.Integer maxToProcess)
Sets the maximum number of days to process at a time.

Parameters:
maxToProcess - maximum number of days

getMaxToProcess

public java.lang.Integer getMaxToProcess()
Gets the maximum number of days to process at a time.

Returns:
maximum number of days

isFailOnMissing

public boolean isFailOnMissing()
Gets whether the job should fail if data is missing within the desired date range.

Returns:
true if the job should fail on missing data

setFailOnMissing

public void setFailOnMissing(boolean failOnMissing)
Sets whether the job should fail if data is missing within the desired date range.

Parameters:
failOnMissing - true if the job should fail on missing data

getDateRange

public DateRange getDateRange()
Gets the desired input date range to process based on the configuration and available inputs.

Returns:
desired date range

getFileSystem

protected org.apache.hadoop.fs.FileSystem getFileSystem()
Gets the file system.

Returns:
file system

getProps

protected java.util.Properties getProps()
Gets the configuration properties.

Returns:
properties

getAvailableInputsByDate

protected java.util.Map<java.util.Date,java.util.List<DatePath>> getAvailableInputsByDate()
Gets a map from date to available input data.

Returns:
map from date to available input data

getDailyData

protected java.util.SortedMap<java.util.Date,DatePath> getDailyData(org.apache.hadoop.fs.Path path)
                                                             throws java.io.IOException
Get a map from date to path for all paths matching yyyy/MM/dd under the given path.

Parameters:
path - path to search under
Returns:
map of date to path
Throws:
java.io.IOException

getDatedData

protected java.util.SortedMap<java.util.Date,DatePath> getDatedData(org.apache.hadoop.fs.Path path)
                                                             throws java.io.IOException
Get a map from date to path for all paths matching yyyyMMdd under the given path.

Parameters:
path - path to search under
Returns:
map of date to path
Throws:
java.io.IOException

loadInputData

protected void loadInputData()
                      throws java.io.IOException
Determine what input data is available.

Throws:
java.io.IOException

determineAvailableInputDates

protected void determineAvailableInputDates()
Determines what input data is available.


determineDateRange

protected void determineDateRange()
Determine the date range for inputs to process based on the configuration and available inputs.



Matthew Hayes