A B C D E F G H I K L M N O P R S T V W _

A

AbstractJob - Class in datafu.hourglass.jobs
Base class for Hadoop jobs.
AbstractJob() - Constructor for class datafu.hourglass.jobs.AbstractJob
Initializes the job.
AbstractJob(String, Properties) - Constructor for class datafu.hourglass.jobs.AbstractJob
Initializes the job with a job name and properties.
AbstractNonIncrementalJob - Class in datafu.hourglass.jobs
Base class for Hadoop jobs that consume time-partitioned data in a non-incremental way.
AbstractNonIncrementalJob(String, Properties) - Constructor for class datafu.hourglass.jobs.AbstractNonIncrementalJob
Initializes the job.
AbstractNonIncrementalJob.BaseCombiner - Class in datafu.hourglass.jobs
Combiner base class for AbstractNonIncrementalJob.
AbstractNonIncrementalJob.BaseCombiner() - Constructor for class datafu.hourglass.jobs.AbstractNonIncrementalJob.BaseCombiner
 
AbstractNonIncrementalJob.BaseMapper - Class in datafu.hourglass.jobs
Mapper base class for AbstractNonIncrementalJob.
AbstractNonIncrementalJob.BaseMapper() - Constructor for class datafu.hourglass.jobs.AbstractNonIncrementalJob.BaseMapper
 
AbstractNonIncrementalJob.BaseReducer - Class in datafu.hourglass.jobs
Reducer base class for AbstractNonIncrementalJob.
AbstractNonIncrementalJob.BaseReducer() - Constructor for class datafu.hourglass.jobs.AbstractNonIncrementalJob.BaseReducer
 
AbstractNonIncrementalJob.Report - Class in datafu.hourglass.jobs
Reports files created and processed for an iteration of the job.
AbstractNonIncrementalJob.Report() - Constructor for class datafu.hourglass.jobs.AbstractNonIncrementalJob.Report
 
AbstractPartitionCollapsingIncrementalJob - Class in datafu.hourglass.jobs
An IncrementalJob that consumes partitioned input data and collapses the partitions to produce a single output.
AbstractPartitionCollapsingIncrementalJob() - Constructor for class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
Initializes the job.
AbstractPartitionCollapsingIncrementalJob(String, Properties) - Constructor for class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
Initializes the job with a job name and properties.
AbstractPartitionCollapsingIncrementalJob.Report - Class in datafu.hourglass.jobs
Reports files created and processed for an iteration of the job.
AbstractPartitionCollapsingIncrementalJob.Report() - Constructor for class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob.Report
 
AbstractPartitionPreservingIncrementalJob - Class in datafu.hourglass.jobs
An IncrementalJob that consumes partitioned input data and produces output data having the same partitions.
AbstractPartitionPreservingIncrementalJob() - Constructor for class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
Initializes the job.
AbstractPartitionPreservingIncrementalJob(String, Properties) - Constructor for class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
Initializes the job with a job name and properties.
AbstractPartitionPreservingIncrementalJob.Report - Class in datafu.hourglass.jobs
Reports files created and processed for an iteration of the job.
AbstractPartitionPreservingIncrementalJob.Report() - Constructor for class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob.Report
 
accumulate(In) - Method in interface datafu.hourglass.model.Accumulator
Accumulate another value.
Accumulator<In,Out> - Interface in datafu.hourglass.model
Collects a sequence of values and produces one value as a result.
add(Path) - Method in class datafu.hourglass.jobs.FileCleaner
Add a path to be removed later.
add(String) - Method in class datafu.hourglass.jobs.FileCleaner
Add a path to be removed later.
addInputPath(Path) - Method in class datafu.hourglass.jobs.ReduceEstimator
 
addInputPath(String, Path) - Method in class datafu.hourglass.jobs.ReduceEstimator
 
AvroDateRangeMetadata - Class in datafu.hourglass.avro
Manages the storage and retrieval of date ranges in the metadata of Avro files.
AvroDateRangeMetadata() - Constructor for class datafu.hourglass.avro.AvroDateRangeMetadata
 
AvroKeyValueIdentityMapper - Class in datafu.hourglass.mapreduce
A mapper which outputs key-value pairs as-is.
AvroKeyValueIdentityMapper() - Constructor for class datafu.hourglass.mapreduce.AvroKeyValueIdentityMapper
 
AvroKeyValueWithMetadataOutputFormat<K,V> - Class in datafu.hourglass.avro
FileOutputFormat for writing Avro container files of key/value pairs.
AvroKeyValueWithMetadataOutputFormat() - Constructor for class datafu.hourglass.avro.AvroKeyValueWithMetadataOutputFormat
 
AvroKeyValueWithMetadataRecordWriter<K,V> - Class in datafu.hourglass.avro
Writes key/value pairs to an Avro container file.
AvroKeyValueWithMetadataRecordWriter(AvroDatumConverter<K, ?>, AvroDatumConverter<V, ?>, CodecFactory, OutputStream, Configuration) - Constructor for class datafu.hourglass.avro.AvroKeyValueWithMetadataRecordWriter
 
AvroKeyWithMetadataOutputFormat<T> - Class in datafu.hourglass.avro
FileOutputFormat for writing Avro container files.
AvroKeyWithMetadataOutputFormat() - Constructor for class datafu.hourglass.avro.AvroKeyWithMetadataOutputFormat
Constructor.
AvroKeyWithMetadataOutputFormat(AvroKeyWithMetadataOutputFormat.RecordWriterFactory) - Constructor for class datafu.hourglass.avro.AvroKeyWithMetadataOutputFormat
Constructor.
AvroKeyWithMetadataOutputFormat.RecordWriterFactory<T> - Class in datafu.hourglass.avro
A factory for creating record writers.
AvroKeyWithMetadataOutputFormat.RecordWriterFactory() - Constructor for class datafu.hourglass.avro.AvroKeyWithMetadataOutputFormat.RecordWriterFactory
 
AvroKeyWithMetadataRecordWriter<T> - Class in datafu.hourglass.avro
Writes Avro records to an Avro container file output stream.
AvroKeyWithMetadataRecordWriter(Schema, CodecFactory, OutputStream, Configuration) - Constructor for class datafu.hourglass.avro.AvroKeyWithMetadataRecordWriter
Constructor.
AvroMultipleInputsKeyInputFormat<T> - Class in datafu.hourglass.avro
A MapReduce InputFormat that can handle Avro container files and multiple inputs.
AvroMultipleInputsKeyInputFormat() - Constructor for class datafu.hourglass.avro.AvroMultipleInputsKeyInputFormat
 
AvroMultipleInputsUtil - Class in datafu.hourglass.avro
Helper methods for dealing with multiple Avro input schemas.
AvroMultipleInputsUtil() - Constructor for class datafu.hourglass.avro.AvroMultipleInputsUtil
 

B

build() - Method in class datafu.hourglass.schemas.TaskSchemas.Builder
 

C

call() - Method in class datafu.hourglass.jobs.StagedOutputJob
Run the job.
clean() - Method in class datafu.hourglass.jobs.FileCleaner
Removes added paths from the file system.
cleanup(Reducer<Object, Object, Object, Object>.Context) - Method in class datafu.hourglass.mapreduce.DelegatingCombiner
 
cleanup(Mapper<Object, Object, Object, Object>.Context) - Method in class datafu.hourglass.mapreduce.DelegatingMapper
 
cleanup(Reducer<Object, Object, Object, Object>.Context) - Method in class datafu.hourglass.mapreduce.DelegatingReducer
 
cleanup() - Method in interface datafu.hourglass.model.Accumulator
Resets the internal state so that all values accumulated so far are forgotten.
close(TaskAttemptContext) - Method in class datafu.hourglass.avro.AvroKeyValueWithMetadataRecordWriter
close(TaskAttemptContext) - Method in class datafu.hourglass.avro.AvroKeyWithMetadataRecordWriter
close() - Method in class datafu.hourglass.mapreduce.ObjectProcessor
 
close() - Method in class datafu.hourglass.mapreduce.PartitioningReducer
 
CollapsingCombiner - Class in datafu.hourglass.mapreduce
The combiner used by AbstractPartitionCollapsingIncrementalJob and its derived classes.
CollapsingCombiner() - Constructor for class datafu.hourglass.mapreduce.CollapsingCombiner
 
CollapsingMapper - Class in datafu.hourglass.mapreduce
The mapper used by AbstractPartitionCollapsingIncrementalJob and its derived classes.
CollapsingMapper() - Constructor for class datafu.hourglass.mapreduce.CollapsingMapper
 
CollapsingReducer - Class in datafu.hourglass.mapreduce
The reducer used by AbstractPartitionCollapsingIncrementalJob and its derived classes.
CollapsingReducer() - Constructor for class datafu.hourglass.mapreduce.CollapsingReducer
 
collect(K, V) - Method in interface datafu.hourglass.model.KeyValueCollector
Collects key-value pairs.
CombinedAvroKeyInputFormat<T> - Class in datafu.hourglass.avro
A combined input format for reading Avro data.
CombinedAvroKeyInputFormat() - Constructor for class datafu.hourglass.avro.CombinedAvroKeyInputFormat
 
CombinedAvroKeyInputFormat.CombinedAvroKeyRecordReader<T> - Class in datafu.hourglass.avro
 
CombinedAvroKeyInputFormat.CombinedAvroKeyRecordReader(CombineFileSplit, TaskAttemptContext, Integer) - Constructor for class datafu.hourglass.avro.CombinedAvroKeyInputFormat.CombinedAvroKeyRecordReader
 
COMBINER_IMPL_PATH - Static variable in class datafu.hourglass.mapreduce.Parameters
 
compareTo(DatePath) - Method in class datafu.hourglass.fs.DatePath
 
config(Configuration) - Method in class datafu.hourglass.jobs.AbstractJob
Overridden to provide custom configuration before the job starts.
config(Configuration) - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
 
config(Configuration) - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
 
configureOutputDateRange(Configuration, DateRange) - Static method in class datafu.hourglass.avro.AvroDateRangeMetadata
Updates the Hadoop configuration so that the Avro files which are written have date range information stored in the metadata.
countBytes(FileSystem, Path) - Static method in class datafu.hourglass.fs.PathUtils
Sums the size of all files listed under a given path.
create(Schema, CodecFactory, OutputStream, Configuration) - Method in class datafu.hourglass.avro.AvroKeyWithMetadataOutputFormat.RecordWriterFactory
Creates a new record writer instance.
createDatedPath(Path, Date) - Static method in class datafu.hourglass.fs.DatePath
 
createNestedDatedPath(Path, Date) - Static method in class datafu.hourglass.fs.DatePath
 
createPlan() - Method in class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
Create the execution plan.
createPlan() - Method in class datafu.hourglass.jobs.PartitionPreservingExecutionPlanner
Create the execution plan.
createRandomTempPath() - Method in class datafu.hourglass.jobs.AbstractJob
Creates a random temporary path within the file system.
createRecordReader(InputSplit, TaskAttemptContext) - Method in class datafu.hourglass.avro.AvroMultipleInputsKeyInputFormat
createRecordReader(InputSplit, TaskAttemptContext) - Method in class datafu.hourglass.avro.CombinedAvroKeyInputFormat
 
createStagedJob(Configuration, String, List<String>, String, String, Logger) - Static method in class datafu.hourglass.jobs.StagedOutputJob
Creates a job which using a temporary staging location for the output data.

D

datafu.hourglass.avro - package datafu.hourglass.avro
Input and output formats for using Avro in incremental Hadoop jobs.
datafu.hourglass.fs - package datafu.hourglass.fs
Classes for working with the file system.
datafu.hourglass.jobs - package datafu.hourglass.jobs
Incremental Hadoop jobs and some supporting classes.
datafu.hourglass.mapreduce - package datafu.hourglass.mapreduce
Implementations of mappers, combiners, and reducers used by incremental jobs.
datafu.hourglass.model - package datafu.hourglass.model
Interfaces which define the incremental processing model.
datafu.hourglass.schemas - package datafu.hourglass.schemas
Classes that help manage the Avro schemas used by the jobs.
datedPathFormat - Static variable in class datafu.hourglass.fs.PathUtils
 
DatePath - Class in datafu.hourglass.fs
Represents a path and the corresponding date that is associated with it.
DatePath(Date, Path) - Constructor for class datafu.hourglass.fs.DatePath
 
DateRange - Class in datafu.hourglass.fs
A date range, consisting of a start and end date.
DateRange(Date, Date) - Constructor for class datafu.hourglass.fs.DateRange
 
DateRangeConfigurable - Interface in datafu.hourglass.jobs
An interface for an object with a configurable output date range.
DateRangePlanner - Class in datafu.hourglass.jobs
Determines the date range of inputs which should be processed.
DateRangePlanner() - Constructor for class datafu.hourglass.jobs.DateRangePlanner
 
DelegatingCombiner - Class in datafu.hourglass.mapreduce
A Hadoop combiner which delegates to an implementation read from the distributed cache.
DelegatingCombiner() - Constructor for class datafu.hourglass.mapreduce.DelegatingCombiner
 
DelegatingMapper - Class in datafu.hourglass.mapreduce
A Hadoop mapper which delegates to an implementation read from the distributed cache.
DelegatingMapper() - Constructor for class datafu.hourglass.mapreduce.DelegatingMapper
 
DelegatingReducer - Class in datafu.hourglass.mapreduce
A Hadoop reducer which delegates to an implementation read from the distributed cache.
DelegatingReducer() - Constructor for class datafu.hourglass.mapreduce.DelegatingReducer
 
determineAvailableInputDates() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Determines what input data is available.
determineDateRange() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Determine the date range for inputs to process based on the configuration and available inputs.
DistributedCacheHelper - Class in datafu.hourglass.mapreduce
Methods for working with the Hadoop distributed cache.
DistributedCacheHelper() - Constructor for class datafu.hourglass.mapreduce.DistributedCacheHelper
 

E

ensurePath(Path) - Method in class datafu.hourglass.jobs.AbstractJob
Creates a path, if it does not already exist.
equals(Object) - Method in class datafu.hourglass.fs.DatePath
 
ExecutionPlanner - Class in datafu.hourglass.jobs
Base class for execution planners.
ExecutionPlanner(FileSystem, Properties) - Constructor for class datafu.hourglass.jobs.ExecutionPlanner
Initializes the execution planner.

F

FileCleaner - Class in datafu.hourglass.jobs
Used to remove files from the file system when they are no longer needed.
FileCleaner(FileSystem) - Constructor for class datafu.hourglass.jobs.FileCleaner
 
findDatedPaths(FileSystem, Path) - Static method in class datafu.hourglass.fs.PathUtils
List all paths matching the "yyyyMMdd" format under a given path.
findNestedDatedPaths(FileSystem, Path) - Static method in class datafu.hourglass.fs.PathUtils
List all paths matching the "yyyy/MM/dd" format under a given path.

G

getAccumulator() - Method in class datafu.hourglass.mapreduce.CollapsingCombiner
Gets the accumulator used to perform aggregation.
getAccumulator() - Method in class datafu.hourglass.mapreduce.PartitioningCombiner
Gets the accumulator used to perform aggregation.
getAccumulator() - Method in class datafu.hourglass.mapreduce.PartitioningReducer
Gets the accumulator used to perform aggregation.
getAvailableInputsByDate() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Gets a map from date to available input data.
getBeginDate() - Method in class datafu.hourglass.fs.DateRange
 
getCombineInputs() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob
Gets whether inputs should be combined.
getCombineProcessor() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
 
getCombinerAccumulator() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
Gets the accumulator used for the combiner.
getCombinerAccumulator() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
Gets the accumulator used for the combiner.
getCombinerAccumulator() - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
 
getCombinerAccumulator() - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
 
getCombinerClass() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob
Gets the combiner class.
getConf() - Method in class datafu.hourglass.jobs.TimePartitioner
 
getContext() - Method in class datafu.hourglass.mapreduce.ObjectProcessor
 
getCountersParentPath() - Method in class datafu.hourglass.jobs.AbstractJob
Gets the path where counters will be stored.
getCountersParentPath() - Method in class datafu.hourglass.jobs.StagedOutputJob
Gets path to store the counters.
getCountersPath() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob.Report
Gets the path to the counters file, if one was written.
getCountersPath() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob.Report
Gets the path to the counters file, if one was written.
getCountersPath() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob.Report
Gets the path to the counters file, if one was written.
getCountersPath() - Method in class datafu.hourglass.jobs.StagedOutputJob
Path to written counters.
getCurrentDateRange() - Method in class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
 
getDailyData(Path) - Method in class datafu.hourglass.jobs.ExecutionPlanner
Get a map from date to path for all paths matching yyyy/MM/dd under the given path.
getDate() - Method in class datafu.hourglass.fs.DatePath
 
getDatedData(Path) - Method in class datafu.hourglass.jobs.ExecutionPlanner
Get a map from date to path for all paths matching yyyyMMdd under the given path.
getDatedIntermediateValueSchema() - Method in class datafu.hourglass.schemas.PartitionCollapsingSchemas
 
getDateForDatedPath(Path) - Static method in class datafu.hourglass.fs.PathUtils
Gets the date for a path in the "yyyyMMdd" format.
getDateForNestedDatedPath(Path) - Static method in class datafu.hourglass.fs.PathUtils
Gets the date for a path in the "yyyy/MM/dd" format.
getDateRange(Date, Date, Collection<Date>, Integer, Integer) - Static method in class datafu.hourglass.jobs.DateRangePlanner
Determines the date range of inputs which should be processed.
getDateRange() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Gets the desired input date range to process based on the configuration and available inputs.
getDatesToProcess() - Method in class datafu.hourglass.jobs.PartitionPreservingExecutionPlanner
Gets the input dates which are to be processed.
getDaysAgo() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Gets the number of days to subtract off the end date.
getDaysAgo() - Method in class datafu.hourglass.jobs.TimeBasedJob
Gets the number of days to subtract off the end of the consumption window.
getEndDate() - Method in class datafu.hourglass.fs.DateRange
 
getEndDate() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Gets the end date
getEndDate() - Method in class datafu.hourglass.jobs.TimeBasedJob
Gets the end date.
getFileSystem() - Method in class datafu.hourglass.jobs.AbstractJob
Gets the file system.
getFileSystem() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Gets the file system.
getFinal() - Method in interface datafu.hourglass.model.Accumulator
Get the output value corresponding to all input values accumulated so far.
getInputFiles() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob.Report
Gets input files that were processed.
getInputFiles() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob.Report
Gets new input files that were processed.
getInputFiles() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob.Report
Gets input files that were processed.
getInputKeySchemaForSplit(Configuration, InputSplit) - Static method in class datafu.hourglass.avro.AvroMultipleInputsUtil
Gets the schema for a particular input split.
getInputPaths() - Method in class datafu.hourglass.jobs.AbstractJob
Gets the input paths.
getInputPaths() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Gets the input paths.
getInputSchemas() - Method in class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
Gets the input schemas.
getInputSchemas() - Method in class datafu.hourglass.jobs.PartitionPreservingExecutionPlanner
Gets the input schemas.
getInputSchemasByPath() - Method in class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
Gets a map from input path to schema.
getInputSchemasByPath() - Method in class datafu.hourglass.jobs.PartitionPreservingExecutionPlanner
Gets a map from input path to schema.
getInputsToProcess() - Method in class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
Gets all inputs that will be processed.
getInputsToProcess() - Method in class datafu.hourglass.jobs.PartitionPreservingExecutionPlanner
Gets the inputs which are to be processed.
getIntermediateValueSchema() - Method in class datafu.hourglass.jobs.IncrementalJob
Gets the Avro schema for the intermediate value.
getIntermediateValueSchema() - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
 
getIntermediateValueSchema() - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
 
getIntermediateValueSchema() - Method in class datafu.hourglass.schemas.PartitionCollapsingSchemas
 
getIntermediateValueSchema() - Method in class datafu.hourglass.schemas.PartitionPreservingSchemas
 
getIntermediateValueSchema() - Method in class datafu.hourglass.schemas.TaskSchemas
 
getJobId() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob.Report
Gets the job ID.
getJobId() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob.Report
Gets the job ID.
getJobId() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob.Report
Gets the job ID.
getJobName() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob.Report
Gets the job name.
getJobName() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob.Report
Gets the job name.
getJobName() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob.Report
Gets the job name.
getKeySchema() - Method in class datafu.hourglass.jobs.IncrementalJob
Gets the Avro schema for the key.
getKeySchema() - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
 
getKeySchema() - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
 
getKeySchema() - Method in class datafu.hourglass.schemas.PartitionCollapsingSchemas
 
getKeySchema() - Method in class datafu.hourglass.schemas.PartitionPreservingSchemas
 
getKeySchema() - Method in class datafu.hourglass.schemas.TaskSchemas
 
getMapInputSchemas() - Method in class datafu.hourglass.schemas.PartitionCollapsingSchemas
 
getMapInputSchemas() - Method in class datafu.hourglass.schemas.PartitionPreservingSchemas
 
getMapOutputKeySchema() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob
Gets the key schema for the map output.
getMapOutputKeySchema() - Method in class datafu.hourglass.schemas.PartitionCollapsingSchemas
 
getMapOutputKeySchema() - Method in class datafu.hourglass.schemas.PartitionPreservingSchemas
 
getMapOutputSchema() - Method in class datafu.hourglass.schemas.PartitionCollapsingSchemas
 
getMapOutputSchema() - Method in class datafu.hourglass.schemas.PartitionPreservingSchemas
 
getMapOutputValueSchema() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob
Gets the value schema for the map output.
getMapOutputValueSchema() - Method in class datafu.hourglass.schemas.PartitionCollapsingSchemas
 
getMapOutputValueSchema() - Method in class datafu.hourglass.schemas.PartitionPreservingSchemas
 
getMapper() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
Gets the mapper.
getMapper() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
Gets the mapper.
getMapper() - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
 
getMapper() - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
 
getMapper() - Method in class datafu.hourglass.mapreduce.CollapsingMapper
Gets the mapper.
getMapper() - Method in class datafu.hourglass.mapreduce.PartitioningMapper
Gets the mapper.
getMapperClass() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob
Gets the mapper class.
getMapProcessor() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
 
getMaxIterations() - Method in class datafu.hourglass.jobs.IncrementalJob
Gets the maximum number of iterations for the job.
getMaxToProcess() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Gets the maximum number of days to process at a time.
getMaxToProcess() - Method in class datafu.hourglass.jobs.IncrementalJob
Gets the maximum number of days of input data to process in a single run.
getName() - Method in class datafu.hourglass.jobs.AbstractJob
Gets the job name
getNeedsAnotherPass() - Method in class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
Gets whether another pass will be required.
getNeedsAnotherPass() - Method in class datafu.hourglass.jobs.PartitionPreservingExecutionPlanner
Gets whether another pass will be required.
getNestedPathRoot(Path) - Static method in class datafu.hourglass.fs.PathUtils
Gets the root path for a path in the "yyyy/MM/dd" format.
getNewAccumulator() - Method in class datafu.hourglass.mapreduce.CollapsingReducer
 
getNewInputsToProcess() - Method in class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
Gets only the new data that will be processed.
getNumDays() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Gets the number of days to process.
getNumDays() - Method in class datafu.hourglass.jobs.TimeBasedJob
Gets the number of consecutive days to process.
getNumReducers() - Method in class datafu.hourglass.jobs.AbstractJob
Gets the number of reducers to use.
getNumReducers() - Method in class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
Get the number of reducers to use based on the input and previous output data size.
getNumReducers() - Method in class datafu.hourglass.jobs.PartitionPreservingExecutionPlanner
Get the number of reducers to use based on the input data size.
getNumReducers() - Method in class datafu.hourglass.jobs.ReduceEstimator
 
getOldAccumulator() - Method in class datafu.hourglass.mapreduce.CollapsingReducer
 
getOldInputFiles() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob.Report
Gets old input files that were processed.
getOldInputsToProcess() - Method in class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
Gets only the old data that will be processed.
getOldRecordMerger() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
Gets the record merger that is capable of unmerging old partial output from the new output.
getOldRecordMerger() - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
 
getOutputFile() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob.Report
Gets the output file that was produced by the job.
getOutputFileDateRange(FileSystem, Path) - Static method in class datafu.hourglass.avro.AvroDateRangeMetadata
Reads the date range from the metadata stored in an Avro file.
getOutputFiles() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob.Report
Gets the output files that were produced by the job.
getOutputPath() - Method in class datafu.hourglass.jobs.AbstractJob
Gets the output path.
getOutputPath() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob.Report
Gets the path to the output which was produced by the job.
getOutputPath() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Gets the output path.
getOutputSchemaName() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
Get the name for the reduce output schema.
getOutputSchemaName() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
Get the name for the reduce output schema.
getOutputSchemaNamespace() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
Get the namespace for the reduce output schema.
getOutputSchemaNamespace() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
Get the namespace for the reduce output schema.
getOutputValueSchema() - Method in class datafu.hourglass.jobs.IncrementalJob
Gets the Avro schema for the output data.
getOutputValueSchema() - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
 
getOutputValueSchema() - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
 
getOutputValueSchema() - Method in class datafu.hourglass.schemas.PartitionCollapsingSchemas
 
getOutputValueSchema() - Method in class datafu.hourglass.schemas.PartitionPreservingSchemas
 
getOutputValueSchema() - Method in class datafu.hourglass.schemas.TaskSchemas
 
getPartition(AvroKey<GenericRecord>, AvroValue<GenericRecord>, int) - Method in class datafu.hourglass.jobs.TimePartitioner
 
getPath() - Method in class datafu.hourglass.fs.DatePath
 
getPreviousOutputToProcess() - Method in class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
Gets the previous output to reuse, or null if no output is being reused.
getProperties() - Method in class datafu.hourglass.jobs.AbstractJob
Gets the configuration properties.
getProps() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Gets the configuration properties.
getRecordMerger() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
Gets the record merger that is capable of merging previous output with a new partial output.
getRecordMerger() - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
 
getRecordWriter(TaskAttemptContext) - Method in class datafu.hourglass.avro.AvroKeyValueWithMetadataOutputFormat
getRecordWriter(TaskAttemptContext) - Method in class datafu.hourglass.avro.AvroKeyWithMetadataOutputFormat
getReduceOutputSchema() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob
Gets the reduce output schema.
getReduceOutputSchema() - Method in class datafu.hourglass.schemas.PartitionCollapsingSchemas
 
getReduceOutputSchema() - Method in class datafu.hourglass.schemas.PartitionPreservingSchemas
 
getReduceProcessor() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
 
getReducerAccumulator() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
Gets the accumulator used for the reducer.
getReducerAccumulator() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
Gets the accumulator used for the reducer.
getReducerAccumulator() - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
 
getReducerAccumulator() - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
 
getReducerClass() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob
Gets the reducer class.
getReport() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob
Gets a report summarizing the run.
getReports() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
Get reports that summarize each of the job iterations.
getReports() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
Get reports that summarize each of the job iterations.
getRetentionCount() - Method in class datafu.hourglass.jobs.AbstractJob
Gets the number of days of data which will be retained in the output path.
getReusedOutput() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob.Report
Gets the output that was reused, if one was reused.
getReuseOutput() - Method in class datafu.hourglass.mapreduce.CollapsingCombiner
Gets whether previous output is being reused.
getReuseOutput() - Method in class datafu.hourglass.mapreduce.CollapsingMapper
Gets whether previous output is being reused.
getReuseOutput() - Method in class datafu.hourglass.mapreduce.CollapsingReducer
Gets whether previous output is being reused.
getReusePreviousOutput() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
Get whether previous output should be reused.
getReusePreviousOutput() - Method in class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
Gets whether previous output should be reused, if it exists.
getSchemaFromFile(FileSystem, Path) - Static method in class datafu.hourglass.fs.PathUtils
Gets the schema from a given Avro data file.
getSchemaFromPath(FileSystem, Path) - Static method in class datafu.hourglass.fs.PathUtils
Gets the schema for the first Avro file under the given path.
getSchemas() - Method in class datafu.hourglass.jobs.IncrementalJob
Gets the schemas.
getSchemas() - Method in class datafu.hourglass.mapreduce.CollapsingCombiner
Gets the schemas.
getSchemas() - Method in class datafu.hourglass.mapreduce.CollapsingMapper
Gets the Avro schemas.
getSchemas() - Method in class datafu.hourglass.mapreduce.PartitioningMapper
Gets the Avro schemas.
getSchemas() - Method in class datafu.hourglass.mapreduce.PartitioningReducer
Gets the Avro schemas
getStartDate() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Gets the start date
getStartDate() - Method in class datafu.hourglass.jobs.TimeBasedJob
Gets the start date.
getTempPath() - Method in class datafu.hourglass.jobs.AbstractJob
Gets the temporary path under which intermediate files will be stored.
getWriteCounters() - Method in class datafu.hourglass.jobs.StagedOutputJob
Get whether counters should be written.
getWriterSchema() - Method in class datafu.hourglass.avro.AvroKeyValueWithMetadataRecordWriter
Gets the writer schema for the key/value pair generic record.

H

hashCode() - Method in class datafu.hourglass.fs.DatePath
 

I

IncrementalJob - Class in datafu.hourglass.jobs
Base class for incremental jobs.
IncrementalJob() - Constructor for class datafu.hourglass.jobs.IncrementalJob
Initializes the job.
IncrementalJob(String, Properties) - Constructor for class datafu.hourglass.jobs.IncrementalJob
Initializes the job with a job name and properties.
initialize(InputSplit, TaskAttemptContext) - Method in class datafu.hourglass.avro.CombinedAvroKeyInputFormat.CombinedAvroKeyRecordReader
 
initialize() - Method in class datafu.hourglass.jobs.AbstractJob
Initialization required before running job.
initialize() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
 
initialize() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
 
initialize() - Method in class datafu.hourglass.jobs.IncrementalJob
 
INPUT_TIMES - Static variable in class datafu.hourglass.jobs.TimePartitioner
 
isFailOnMissing() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Gets whether the job should fail if data is missing within the desired date range.
isFailOnMissing() - Method in class datafu.hourglass.jobs.IncrementalJob
Gets whether the job should fail if input data within the desired range is missing.
isUseCombiner() - Method in class datafu.hourglass.jobs.AbstractJob
Gets whether the combiner should be used.

K

keepLatestDatedPaths(FileSystem, Path, int) - Static method in class datafu.hourglass.fs.PathUtils
Delete all but the last N days of paths matching the "yyyyMMdd" format.
keepLatestNestedDatedPaths(FileSystem, Path, int) - Static method in class datafu.hourglass.fs.PathUtils
Delete all but the last N days of paths matching the "yyyy/MM/dd" format.
KeyValueCollector<K,V> - Interface in datafu.hourglass.model
Provided to an instance of Mapper to collect key-value pairs.

L

loadInputData() - Method in class datafu.hourglass.jobs.ExecutionPlanner
Determine what input data is available.

M

map(Object, Object, Mapper<Object, Object, Object, Object>.Context) - Method in class datafu.hourglass.mapreduce.AvroKeyValueIdentityMapper
 
map(Object, MapContext<Object, Object, Object, Object>) - Method in class datafu.hourglass.mapreduce.CollapsingMapper
 
map(Object, Object, Mapper<Object, Object, Object, Object>.Context) - Method in class datafu.hourglass.mapreduce.DelegatingMapper
 
map(Object, MapContext<Object, Object, Object, Object>) - Method in class datafu.hourglass.mapreduce.ObjectMapper
 
map(Object, MapContext<Object, Object, Object, Object>) - Method in class datafu.hourglass.mapreduce.PartitioningMapper
 
map(In, KeyValueCollector<OutKey, OutVal>) - Method in interface datafu.hourglass.model.Mapper
Maps an input record to one or more key-value pairs.
Mapper<In,OutKey,OutVal> - Interface in datafu.hourglass.model
Maps an input record to one or more key-value pairs.
MAPPER_IMPL_PATH - Static variable in class datafu.hourglass.mapreduce.Parameters
 
MaxInputDataExceededException - Class in datafu.hourglass.jobs
 
MaxInputDataExceededException() - Constructor for class datafu.hourglass.jobs.MaxInputDataExceededException
 
MaxInputDataExceededException(String) - Constructor for class datafu.hourglass.jobs.MaxInputDataExceededException
 
merge(T, T) - Method in interface datafu.hourglass.model.Merger
Merges two values together.
Merger<T> - Interface in datafu.hourglass.model
Merges two values together.
METADATA_DATE_END - Static variable in class datafu.hourglass.avro.AvroDateRangeMetadata
 
METADATA_DATE_START - Static variable in class datafu.hourglass.avro.AvroDateRangeMetadata
 

N

nestedDatedPathFormat - Static variable in class datafu.hourglass.fs.PathUtils
 
nonHiddenPathFilter - Static variable in class datafu.hourglass.fs.PathUtils
Filters out paths starting with "." and "_".

O

ObjectMapper - Class in datafu.hourglass.mapreduce
Defines the interface for a mapper implementation that DelegatingMapper delegates to.
ObjectMapper() - Constructor for class datafu.hourglass.mapreduce.ObjectMapper
 
ObjectProcessor - Class in datafu.hourglass.mapreduce
Base class for ObjectMapper and ObjectReducer.
ObjectProcessor() - Constructor for class datafu.hourglass.mapreduce.ObjectProcessor
 
ObjectReducer - Class in datafu.hourglass.mapreduce
Defines the interface for combiner and reducer implementations that DelegatingCombiner and DelegatingReducer delegate to.
ObjectReducer() - Constructor for class datafu.hourglass.mapreduce.ObjectReducer
 

P

Parameters - Class in datafu.hourglass.mapreduce
Parameters used by the jobs to pass configuration settings to the mappers, combiners, and reducers.
Parameters() - Constructor for class datafu.hourglass.mapreduce.Parameters
 
PartitionCollapsingExecutionPlanner - Class in datafu.hourglass.jobs
Execution planner used by AbstractPartitionCollapsingIncrementalJob and its derived classes.
PartitionCollapsingExecutionPlanner(FileSystem, Properties) - Constructor for class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
Initializes the execution planner.
PartitionCollapsingIncrementalJob - Class in datafu.hourglass.jobs
A concrete version of AbstractPartitionCollapsingIncrementalJob.
PartitionCollapsingIncrementalJob(Class) - Constructor for class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
Initializes the job.
PartitionCollapsingSchemas - Class in datafu.hourglass.schemas
Generates the Avro schemas used by AbstractPartitionCollapsingIncrementalJob and its derivations.
PartitionCollapsingSchemas(TaskSchemas, Map<String, Schema>, String, String) - Constructor for class datafu.hourglass.schemas.PartitionCollapsingSchemas
 
PartitioningCombiner - Class in datafu.hourglass.mapreduce
The combiner used by AbstractPartitionPreservingIncrementalJob and its derived classes.
PartitioningCombiner() - Constructor for class datafu.hourglass.mapreduce.PartitioningCombiner
 
PartitioningMapper - Class in datafu.hourglass.mapreduce
The mapper used by AbstractPartitionPreservingIncrementalJob and its derived classes.
PartitioningMapper() - Constructor for class datafu.hourglass.mapreduce.PartitioningMapper
 
PartitioningReducer - Class in datafu.hourglass.mapreduce
The reducer used by AbstractPartitionPreservingIncrementalJob and its derived classes.
PartitioningReducer() - Constructor for class datafu.hourglass.mapreduce.PartitioningReducer
 
PartitionPreservingExecutionPlanner - Class in datafu.hourglass.jobs
Execution planner used by AbstractPartitionPreservingIncrementalJob and its derived classes.
PartitionPreservingExecutionPlanner(FileSystem, Properties) - Constructor for class datafu.hourglass.jobs.PartitionPreservingExecutionPlanner
Initializes the execution planner.
PartitionPreservingIncrementalJob - Class in datafu.hourglass.jobs
A concrete version of AbstractPartitionPreservingIncrementalJob.
PartitionPreservingIncrementalJob(Class) - Constructor for class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
Initializes the job.
PartitionPreservingSchemas - Class in datafu.hourglass.schemas
Generates the Avro schemas used by AbstractPartitionPreservingIncrementalJob and its derivations.
PartitionPreservingSchemas(TaskSchemas, Map<String, Schema>, String, String) - Constructor for class datafu.hourglass.schemas.PartitionPreservingSchemas
 
PathUtils - Class in datafu.hourglass.fs
A collection of utility methods for dealing with files and paths.
PathUtils() - Constructor for class datafu.hourglass.fs.PathUtils
 

R

randomTempPath() - Method in class datafu.hourglass.jobs.AbstractJob
Generates a random temporary path within the file system.
readObject(Configuration, Path) - Static method in class datafu.hourglass.mapreduce.DistributedCacheHelper
Deserializes an object from a path in HDFS.
reduce(Object, Iterable<Object>, ReduceContext<Object, Object, Object, Object>) - Method in class datafu.hourglass.mapreduce.CollapsingCombiner
 
reduce(Object, Iterable<Object>, ReduceContext<Object, Object, Object, Object>) - Method in class datafu.hourglass.mapreduce.CollapsingReducer
 
reduce(Object, Iterable<Object>, Reducer<Object, Object, Object, Object>.Context) - Method in class datafu.hourglass.mapreduce.DelegatingCombiner
 
reduce(Object, Iterable<Object>, Reducer<Object, Object, Object, Object>.Context) - Method in class datafu.hourglass.mapreduce.DelegatingReducer
 
reduce(Object, Iterable<Object>, ReduceContext<Object, Object, Object, Object>) - Method in class datafu.hourglass.mapreduce.ObjectReducer
 
reduce(Object, Iterable<Object>, ReduceContext<Object, Object, Object, Object>) - Method in class datafu.hourglass.mapreduce.PartitioningCombiner
 
reduce(Object, Iterable<Object>, ReduceContext<Object, Object, Object, Object>) - Method in class datafu.hourglass.mapreduce.PartitioningReducer
 
ReduceEstimator - Class in datafu.hourglass.jobs
Estimates the number of reducers needed based on input size.
ReduceEstimator(FileSystem, Properties) - Constructor for class datafu.hourglass.jobs.ReduceEstimator
 
REDUCER_IMPL_PATH - Static variable in class datafu.hourglass.mapreduce.Parameters
 
REDUCERS_PER_INPUT - Static variable in class datafu.hourglass.jobs.TimePartitioner
 
run() - Method in class datafu.hourglass.jobs.AbstractJob
Run the job.
run() - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob
Runs the job.
run() - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
 
run() - Method in class datafu.hourglass.jobs.AbstractPartitionPreservingIncrementalJob
Run the job.

S

setAccumulator(Accumulator<GenericRecord, GenericRecord>) - Method in class datafu.hourglass.mapreduce.CollapsingCombiner
Sets the accumulator used to perform aggregation.
setAccumulator(Accumulator<GenericRecord, GenericRecord>) - Method in class datafu.hourglass.mapreduce.CollapsingReducer
 
setAccumulator(Accumulator<GenericRecord, GenericRecord>) - Method in class datafu.hourglass.mapreduce.PartitioningCombiner
Sets the accumulator used to perform aggregation.
setAccumulator(Accumulator<GenericRecord, GenericRecord>) - Method in class datafu.hourglass.mapreduce.PartitioningReducer
Sets the accumulator used to perform aggregation.
setCombineInputs(boolean) - Method in class datafu.hourglass.jobs.AbstractNonIncrementalJob
Sets whether inputs should be combined.
setCombinerAccumulator(Accumulator<GenericRecord, GenericRecord>) - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
Set the accumulator for the combiner
setCombinerAccumulator(Accumulator<GenericRecord, GenericRecord>) - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
Set the accumulator for the combiner
setConf(Configuration) - Method in class datafu.hourglass.jobs.TimePartitioner
 
setContext(TaskInputOutputContext<Object, Object, Object, Object>) - Method in class datafu.hourglass.mapreduce.CollapsingMapper
 
setContext(TaskInputOutputContext<Object, Object, Object, Object>) - Method in class datafu.hourglass.mapreduce.ObjectProcessor
 
setContext(TaskInputOutputContext<Object, Object, Object, Object>) - Method in class datafu.hourglass.mapreduce.PartitioningMapper
 
setContext(TaskInputOutputContext<Object, Object, Object, Object>) - Method in class datafu.hourglass.mapreduce.PartitioningReducer
 
setCountersParentPath(Path) - Method in class datafu.hourglass.jobs.AbstractJob
Sets the path where counters will be stored.
setCountersParentPath(Path) - Method in class datafu.hourglass.jobs.StagedOutputJob
Sets path to store the counters.
setDaysAgo(Integer) - Method in class datafu.hourglass.jobs.ExecutionPlanner
Sets the number of days to subtract off the end date.
setDaysAgo(Integer) - Method in class datafu.hourglass.jobs.TimeBasedJob
Sets the number of days to subtract off the end of the consumption window.
setEndDate(Date) - Method in class datafu.hourglass.jobs.ExecutionPlanner
Sets the end date.
setEndDate(Date) - Method in class datafu.hourglass.jobs.TimeBasedJob
Sets the end date.
setFailOnMissing(boolean) - Method in class datafu.hourglass.jobs.ExecutionPlanner
Sets whether the job should fail if data is missing within the desired date range.
setFailOnMissing(boolean) - Method in class datafu.hourglass.jobs.IncrementalJob
Sets whether the job should fail if input data within the desired range is missing.
setInputKeySchemaForPath(Job, Schema, String) - Static method in class datafu.hourglass.avro.AvroMultipleInputsUtil
Sets the job input key schema for a path.
setInputPaths(List<Path>) - Method in class datafu.hourglass.jobs.AbstractJob
Sets the input paths.
setInputPaths(List<Path>) - Method in class datafu.hourglass.jobs.ExecutionPlanner
Sets the input paths.
setIntermediateValueSchema(Schema) - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
Sets the Avro schema for the intermediate value.
setIntermediateValueSchema(Schema) - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
Sets the Avro schema for the intermediate value.
setIntermediateValueSchema(Schema) - Method in class datafu.hourglass.schemas.TaskSchemas.Builder
 
setKeySchema(Schema) - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
Sets the Avro schema for the key.
setKeySchema(Schema) - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
Sets the Avro schema for the key.
setKeySchema(Schema) - Method in class datafu.hourglass.schemas.TaskSchemas.Builder
 
setMapper(Mapper<GenericRecord, GenericRecord, GenericRecord>) - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
Set the mapper.
setMapper(Mapper<GenericRecord, GenericRecord, GenericRecord>) - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
Set the mapper.
setMapper(Mapper<GenericRecord, GenericRecord, GenericRecord>) - Method in class datafu.hourglass.mapreduce.CollapsingMapper
Sets the mapper.
setMapper(Mapper<GenericRecord, GenericRecord, GenericRecord>) - Method in class datafu.hourglass.mapreduce.PartitioningMapper
Sets the mapper.
setMaxIterations(Integer) - Method in class datafu.hourglass.jobs.IncrementalJob
Sets the maximum number of iterations for the job.
setMaxToProcess(Integer) - Method in class datafu.hourglass.jobs.ExecutionPlanner
Sets the maximum number of days to process at a time.
setMaxToProcess(Integer) - Method in class datafu.hourglass.jobs.IncrementalJob
Sets the maximum number of days of input data to process in a single run.
setMerger(Merger<GenericRecord>) - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
Sets the record merger that is capable of merging previous output with a new partial output.
setName(String) - Method in class datafu.hourglass.jobs.AbstractJob
Sets the job name
setNumDays(Integer) - Method in class datafu.hourglass.jobs.ExecutionPlanner
Sets the number of days to process.
setNumDays(Integer) - Method in class datafu.hourglass.jobs.TimeBasedJob
Sets the number of consecutive days to process.
setNumReducers(Integer) - Method in class datafu.hourglass.jobs.AbstractJob
Sets the number of reducers to use.
setOldMerger(Merger<GenericRecord>) - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
Sets the record merger that is capable of unmerging old partial output from the new output.
setOldRecordMerger(Merger<GenericRecord>) - Method in class datafu.hourglass.mapreduce.CollapsingReducer
 
setOnSetup(Setup) - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
Set callback to provide custom configuration before job begins execution.
setOnSetup(Setup) - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
Set callback to provide custom configuration before job begins execution.
setOutputDateRange(DateRange) - Method in interface datafu.hourglass.jobs.DateRangeConfigurable
Sets the date range for the output.
setOutputDateRange(DateRange) - Method in class datafu.hourglass.mapreduce.CollapsingCombiner
 
setOutputDateRange(DateRange) - Method in class datafu.hourglass.mapreduce.CollapsingReducer
 
setOutputPath(Path) - Method in class datafu.hourglass.jobs.AbstractJob
Sets the output path.
setOutputPath(Path) - Method in class datafu.hourglass.jobs.ExecutionPlanner
Sets the output path.
setOutputValueSchema(Schema) - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
Sets the Avro schema for the output data.
setOutputValueSchema(Schema) - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
Sets the Avro schema for the output data.
setOutputValueSchema(Schema) - Method in class datafu.hourglass.schemas.TaskSchemas.Builder
 
setProperties(Properties) - Method in class datafu.hourglass.jobs.AbstractJob
Sets the configuration properties.
setProperties(Properties) - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
 
setProperties(Properties) - Method in class datafu.hourglass.jobs.IncrementalJob
 
setProperties(Properties) - Method in class datafu.hourglass.jobs.TimeBasedJob
 
setRecordMerger(Merger<GenericRecord>) - Method in class datafu.hourglass.mapreduce.CollapsingReducer
 
setReducerAccumulator(Accumulator<GenericRecord, GenericRecord>) - Method in class datafu.hourglass.jobs.PartitionCollapsingIncrementalJob
Set the accumulator for the reducer.
setReducerAccumulator(Accumulator<GenericRecord, GenericRecord>) - Method in class datafu.hourglass.jobs.PartitionPreservingIncrementalJob
Set the accumulator for the reducer.
setRetentionCount(Integer) - Method in class datafu.hourglass.jobs.AbstractJob
Sets the number of days of data which will be retained in the output path.
setReuseOutput(boolean) - Method in class datafu.hourglass.mapreduce.CollapsingCombiner
Sets whether previous output is being reused.
setReuseOutput(boolean) - Method in class datafu.hourglass.mapreduce.CollapsingMapper
Sets whether previous output is being reused.
setReuseOutput(boolean) - Method in class datafu.hourglass.mapreduce.CollapsingReducer
Sets whether previous output is being reused.
setReusePreviousOutput(boolean) - Method in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
Set whether previous output should be reused.
setReusePreviousOutput(boolean) - Method in class datafu.hourglass.jobs.PartitionCollapsingExecutionPlanner
Sets whether previous output should be reused, if it exists.
setSchemas(PartitionCollapsingSchemas) - Method in class datafu.hourglass.mapreduce.CollapsingCombiner
Sets the schemas.
setSchemas(PartitionCollapsingSchemas) - Method in class datafu.hourglass.mapreduce.CollapsingMapper
Sets the Avro schemas.
setSchemas(PartitionCollapsingSchemas) - Method in class datafu.hourglass.mapreduce.CollapsingReducer
Sets the Avro schemas.
setSchemas(PartitionPreservingSchemas) - Method in class datafu.hourglass.mapreduce.PartitioningMapper
Sets the Avro schemas.
setSchemas(PartitionPreservingSchemas) - Method in class datafu.hourglass.mapreduce.PartitioningReducer
Sets the Avro schemas.
setStartDate(Date) - Method in class datafu.hourglass.jobs.ExecutionPlanner
Sets the start date.
setStartDate(Date) - Method in class datafu.hourglass.jobs.TimeBasedJob
Sets the start date.
setTempPath(Path) - Method in class datafu.hourglass.jobs.AbstractJob
Sets the temporary path where intermediate files will be stored.
Setup - Interface in datafu.hourglass.jobs
Used as a callback by PartitionCollapsingIncrementalJob and PartitionPreservingIncrementalJob to provide configuration settings for the Hadoop job.
setup(Configuration) - Method in interface datafu.hourglass.jobs.Setup
Set custom configuration.
setup(Reducer<Object, Object, Object, Object>.Context) - Method in class datafu.hourglass.mapreduce.DelegatingCombiner
 
setup(Mapper<Object, Object, Object, Object>.Context) - Method in class datafu.hourglass.mapreduce.DelegatingMapper
 
setup(Reducer<Object, Object, Object, Object>.Context) - Method in class datafu.hourglass.mapreduce.DelegatingReducer
 
setUseCombiner(boolean) - Method in class datafu.hourglass.jobs.AbstractJob
Sets whether the combiner should be used.
setWriteCounters(boolean) - Method in class datafu.hourglass.jobs.StagedOutputJob
Sets whether counters should be written.
StagedOutputJob - Class in datafu.hourglass.jobs
A derivation of Job that stages its output in another location and only moves it to the final destination if the job completes successfully.
StagedOutputJob(Configuration, String, Logger) - Constructor for class datafu.hourglass.jobs.StagedOutputJob
Initializes the job.

T

TaskSchemas - Class in datafu.hourglass.schemas
Contains the Avro schemas for the key, intermediate value, and output value of a job.
TaskSchemas.Builder - Class in datafu.hourglass.schemas
 
TaskSchemas.Builder() - Constructor for class datafu.hourglass.schemas.TaskSchemas.Builder
 
TEXT_PREFIX - Static variable in class datafu.hourglass.avro.AvroKeyValueWithMetadataRecordWriter
The configuration key prefix for a text output metadata.
TEXT_PREFIX - Static variable in class datafu.hourglass.avro.AvroKeyWithMetadataRecordWriter
The configuration key prefix for a text output metadata.
TimeBasedJob - Class in datafu.hourglass.jobs
Base class for Hadoop jobs that consume time-partitioned data.
TimeBasedJob() - Constructor for class datafu.hourglass.jobs.TimeBasedJob
Initializes the job.
TimeBasedJob(String, Properties) - Constructor for class datafu.hourglass.jobs.TimeBasedJob
Initializes the job with a job name and properties.
TimePartitioner - Class in datafu.hourglass.jobs
A partitioner used by AbstractPartitionPreservingIncrementalJob to limit the number of named outputs used by each reducer.
TimePartitioner() - Constructor for class datafu.hourglass.jobs.TimePartitioner
 
timeZone - Static variable in class datafu.hourglass.fs.PathUtils
 
toString() - Method in class datafu.hourglass.fs.DatePath
 

V

validate() - Method in class datafu.hourglass.jobs.AbstractJob
Validation required before running job.
validate() - Method in class datafu.hourglass.jobs.TimeBasedJob
 

W

waitForCompletion(boolean) - Method in class datafu.hourglass.jobs.StagedOutputJob
Run the job and wait for it to complete.
write(K, V) - Method in class datafu.hourglass.avro.AvroKeyValueWithMetadataRecordWriter
write(AvroKey<T>, NullWritable) - Method in class datafu.hourglass.avro.AvroKeyWithMetadataRecordWriter
writeObject(Configuration, Object, Path) - Static method in class datafu.hourglass.mapreduce.DistributedCacheHelper
Serializes an object to a path in HDFS and adds the file to the distributed cache.

_

_beginTime - Variable in class datafu.hourglass.mapreduce.CollapsingReducer
 
_endTime - Variable in class datafu.hourglass.mapreduce.CollapsingReducer
 
_reusePreviousOutput - Variable in class datafu.hourglass.jobs.AbstractPartitionCollapsingIncrementalJob
 

A B C D E F G H I K L M N O P R S T V W _

Matthew Hayes