AbstractJob (DataFu Hourglass)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

datafu.hourglass.jobs
Class AbstractJob

java.lang.Object
  org.apache.hadoop.conf.Configured
      datafu.hourglass.jobs.AbstractJob

All Implemented Interfaces:: org.apache.hadoop.conf.Configurable

Direct Known Subclasses:: TimeBasedJob

public abstract class AbstractJob
extends org.apache.hadoop.conf.Configured
extends org.apache.hadoop.conf.Configured

Base class for Hadoop jobs.

This class defines a set of common methods and configuration shared by Hadoop jobs. Jobs can be configured either by providing properties or by calling setters. Each property has a corresponding setter.

This class recognizes the following properties:

input.path - Input path job will read from
output.path - Output path job will write to
temp.path - Temporary path under which intermediate files are stored
retention.count - Number of days to retain in output directory
num.reducers - Number of reducers to use
use.combiner - Whether to use a combiner or not
counters.path - Path to store job counters in

The input.path property may be a comma-separated list of paths. When there is more than one it implies a join is to be performed. Alternatively the paths may be listed separately. For example, input.path.first and input.path.second define two separate input paths.

The num.reducers fixes the number of reducers. When not set the number of reducers is computed based on the input size.

The temp.path property defines the parent directory for temporary paths, not the temporary path itself. Temporary paths are created under this directory with an hourglass- prefix followed by a GUID.

The input and output paths are the only required parameters. The rest are optional.

Hadoop configuration may be provided by setting a property with the prefix hadoop-conf.. For example, mapred.min.split.size can be configured by setting property hadoop-conf.mapred.min.split.size to the desired value.

Author:: "Matthew Hayes"

Constructor Summary
`AbstractJob()` Initializes the job.
`AbstractJob(java.lang.String name, java.util.Properties props)` Initializes the job with a job name and properties.

Method Summary
`void`	`config(org.apache.hadoop.conf.Configuration conf)` Overridden to provide custom configuration before the job starts.
`protected org.apache.hadoop.fs.Path`	`createRandomTempPath()` Creates a random temporary path within the file system.
`protected org.apache.hadoop.fs.Path`	`ensurePath(org.apache.hadoop.fs.Path path)` Creates a path, if it does not already exist.
`org.apache.hadoop.fs.Path`	`getCountersParentPath()` Gets the path where counters will be stored.
`protected org.apache.hadoop.fs.FileSystem`	`getFileSystem()` Gets the file system.
`java.util.List<org.apache.hadoop.fs.Path>`	`getInputPaths()` Gets the input paths.
`java.lang.String`	`getName()` Gets the job name
`java.lang.Integer`	`getNumReducers()` Gets the number of reducers to use.
`org.apache.hadoop.fs.Path`	`getOutputPath()` Gets the output path.
`java.util.Properties`	`getProperties()` Gets the configuration properties.
`java.lang.Integer`	`getRetentionCount()` Gets the number of days of data which will be retained in the output path.
`org.apache.hadoop.fs.Path`	`getTempPath()` Gets the temporary path under which intermediate files will be stored.
`protected void`	`initialize()` Initialization required before running job.
`boolean`	`isUseCombiner()` Gets whether the combiner should be used.
`protected org.apache.hadoop.fs.Path`	`randomTempPath()` Generates a random temporary path within the file system.
`abstract void`	`run()` Run the job.
`void`	`setCountersParentPath(org.apache.hadoop.fs.Path countersParentPath)` Sets the path where counters will be stored.
`void`	`setInputPaths(java.util.List<org.apache.hadoop.fs.Path> inputPaths)` Sets the input paths.
`void`	`setName(java.lang.String name)` Sets the job name
`void`	`setNumReducers(java.lang.Integer numReducers)` Sets the number of reducers to use.
`void`	`setOutputPath(org.apache.hadoop.fs.Path outputPath)` Sets the output path.
`void`	`setProperties(java.util.Properties props)` Sets the configuration properties.
`void`	`setRetentionCount(java.lang.Integer retentionCount)` Sets the number of days of data which will be retained in the output path.
`void`	`setTempPath(org.apache.hadoop.fs.Path tempPath)` Sets the temporary path where intermediate files will be stored.
`void`	`setUseCombiner(boolean useCombiner)` Sets whether the combiner should be used.
`protected void`	`validate()` Validation required before running job.

Methods inherited from class org.apache.hadoop.conf.Configured
`getConf, setConf`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

AbstractJob

public AbstractJob()

Initializes the job.

AbstractJob

public AbstractJob(java.lang.String name,
                   java.util.Properties props)

Initializes the job with a job name and properties.

Parameters:: name - Job name; props - Configuration properties

Method Detail

getName

public java.lang.String getName()

Gets the job name

Returns:: Job name

setName

public void setName(java.lang.String name)

Sets the job name

Parameters:: name - Job name

getProperties

public java.util.Properties getProperties()

Gets the configuration properties.

Returns:: Configuration properties

setProperties

public void setProperties(java.util.Properties props)

Sets the configuration properties.

Parameters:: props - Properties

config

public void config(org.apache.hadoop.conf.Configuration conf)

Overridden to provide custom configuration before the job starts.

Parameters:: conf -

getNumReducers

public java.lang.Integer getNumReducers()

Gets the number of reducers to use.

Returns:: Number of reducers

setNumReducers

public void setNumReducers(java.lang.Integer numReducers)

Sets the number of reducers to use. Can also be set with num.reducers property.

Parameters:: numReducers - Number of reducers to use

isUseCombiner

public boolean isUseCombiner()

Gets whether the combiner should be used.

Returns:: True if combiner should be used, otherwise false.

setUseCombiner

public void setUseCombiner(boolean useCombiner)

Sets whether the combiner should be used. Can also be set with use.combiner.

Parameters:: useCombiner - True if a combiner should be used, otherwise false.

getCountersParentPath

public org.apache.hadoop.fs.Path getCountersParentPath()

Gets the path where counters will be stored.

Returns:: Counters path

setCountersParentPath

public void setCountersParentPath(org.apache.hadoop.fs.Path countersParentPath)

Sets the path where counters will be stored. Can also be set with counters.path.

Parameters:: countersParentPath - Counters path

getRetentionCount

public java.lang.Integer getRetentionCount()

Gets the number of days of data which will be retained in the output path. Only the latest will be kept. Older paths will be removed.

Returns:: retention count

setRetentionCount

public void setRetentionCount(java.lang.Integer retentionCount)

Sets the number of days of data which will be retained in the output path. Only the latest will be kept. Older paths will be removed. Can also be set with retention.count.

Parameters:: retentionCount -

getInputPaths

public java.util.List<org.apache.hadoop.fs.Path> getInputPaths()

Gets the input paths. Multiple input paths imply a join is to be performed.

Returns:: input paths

setInputPaths

public void setInputPaths(java.util.List<org.apache.hadoop.fs.Path> inputPaths)

Sets the input paths. Multiple input paths imply a join is to be performed. Can also be set with input.path or several properties starting with input.path..

Parameters:: inputPaths - input paths

getOutputPath

public org.apache.hadoop.fs.Path getOutputPath()

Gets the output path.

Returns:: output path

setOutputPath

public void setOutputPath(org.apache.hadoop.fs.Path outputPath)

Sets the output path. Can also be set with output.path.

Parameters:: outputPath - output path

getTempPath

public org.apache.hadoop.fs.Path getTempPath()

Gets the temporary path under which intermediate files will be stored. Defaults to /tmp.

Returns:: Temporary path

setTempPath

public void setTempPath(org.apache.hadoop.fs.Path tempPath)

Sets the temporary path where intermediate files will be stored. Defaults to /tmp.

Parameters:: tempPath - Temporary path

getFileSystem

protected org.apache.hadoop.fs.FileSystem getFileSystem()

Gets the file system.

Returns:: File system
Throws:: java.io.IOException

randomTempPath

protected org.apache.hadoop.fs.Path randomTempPath()

Generates a random temporary path within the file system. This does not create the path.

Returns:: Random temporary path

createRandomTempPath

protected org.apache.hadoop.fs.Path createRandomTempPath()
                                                  throws java.io.IOException

Creates a random temporary path within the file system.

Returns:: Random temporary path
Throws:: java.io.IOException

ensurePath

protected org.apache.hadoop.fs.Path ensurePath(org.apache.hadoop.fs.Path path)
                                        throws java.io.IOException

Creates a path, if it does not already exist.

Parameters:: path - Path to create
Returns:: The same path that was provided
Throws:: java.io.IOException

validate

protected void validate()

Validation required before running job.

initialize

protected void initialize()

Initialization required before running job.

run

public abstract void run()
                  throws java.io.IOException,
                         java.lang.InterruptedException,
                         java.lang.ClassNotFoundException

Run the job.

Throws:: java.io.IOException; java.lang.InterruptedException; java.lang.ClassNotFoundException

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Matthew Hayes

datafu.hourglass.jobs Class AbstractJob

AbstractJob

AbstractJob

getName

setName

getProperties

setProperties

config

getNumReducers

setNumReducers

isUseCombiner

setUseCombiner

getCountersParentPath

setCountersParentPath

getRetentionCount

setRetentionCount

getInputPaths

setInputPaths

getOutputPath

setOutputPath

getTempPath

setTempPath

getFileSystem

randomTempPath

createRandomTempPath

ensurePath

validate

initialize

run

datafu.hourglass.jobs
Class AbstractJob