StagedOutputJob (DataFu Hourglass)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

datafu.hourglass.jobs
Class StagedOutputJob

java.lang.Object
  org.apache.hadoop.mapreduce.JobContext
      org.apache.hadoop.mapreduce.Job
          datafu.hourglass.jobs.StagedOutputJob

All Implemented Interfaces:: java.util.concurrent.Callable<java.lang.Boolean>

public class StagedOutputJob
extends org.apache.hadoop.mapreduce.Job
implements java.util.concurrent.Callable<java.lang.Boolean>
extends org.apache.hadoop.mapreduce.Job
implements java.util.concurrent.Callable<java.lang.Boolean>

A derivation of Job that stages its output in another location and only moves it to the final destination if the job completes successfully. It also outputs a counters file to the file system that contains counters fetched from Hadoop and other task statistics.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Job
`org.apache.hadoop.mapreduce.Job.JobState`

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.JobContext
`CACHE_ARCHIVES_VISIBILITIES, CACHE_FILE_VISIBILITIES, COMBINE_CLASS_ATTR, conf, credentials, INPUT_FORMAT_CLASS_ATTR, JOB_ACL_MODIFY_JOB, JOB_ACL_VIEW_JOB, JOB_CANCEL_DELEGATION_TOKEN, JOB_NAMENODES, MAP_CLASS_ATTR, OUTPUT_FORMAT_CLASS_ATTR, PARTITIONER_CLASS_ATTR, REDUCE_CLASS_ATTR, ugi, USER_LOG_RETAIN_HOURS`

Constructor Summary
`StagedOutputJob(org.apache.hadoop.conf.Configuration conf, java.lang.String stagingPrefix, org.apache.log4j.Logger log)` Initializes the job.

Method Summary
`java.lang.Boolean`	`call()` Run the job.
`static StagedOutputJob`	`createStagedJob(org.apache.hadoop.conf.Configuration conf, java.lang.String jobName, java.util.List<java.lang.String> inputPaths, java.lang.String stagingLocation, java.lang.String outputPath, org.apache.log4j.Logger log)` Creates a job which using a temporary staging location for the output data.
`org.apache.hadoop.fs.Path`	`getCountersParentPath()` Gets path to store the counters.
`org.apache.hadoop.fs.Path`	`getCountersPath()` Path to written counters.
`boolean`	`getWriteCounters()` Get whether counters should be written.
`void`	`setCountersParentPath(org.apache.hadoop.fs.Path path)` Sets path to store the counters.
`void`	`setWriteCounters(boolean writeCounters)` Sets whether counters should be written.
`boolean`	`waitForCompletion(boolean verbose)` Run the job and wait for it to complete.

Methods inherited from class org.apache.hadoop.mapreduce.Job
failTask, getCounters, getJar, getTaskCompletionEvents, getTrackingURL, isComplete, isSuccessful, killJob, killTask, mapProgress, reduceProgress, setCancelDelegationTokenUponJobCompletion, setCombinerClass, setGroupingComparatorClass, setInputFormatClass, setJarByClass, setJobName, setMapOutputKeyClass, setMapOutputValueClass, setMapperClass, setMapSpeculativeExecution, setNumReduceTasks, setOutputFormatClass, setOutputKeyClass, setOutputValueClass, setPartitionerClass, setReducerClass, setReduceSpeculativeExecution, setSortComparatorClass, setSpeculativeExecution, setupProgress, setWorkingDirectory, submit

Methods inherited from class org.apache.hadoop.mapreduce.Job

failTask, getCounters, getJar, getTaskCompletionEvents, getTrackingURL, isComplete, isSuccessful, killJob, killTask, mapProgress, reduceProgress, setCancelDelegationTokenUponJobCompletion, setCombinerClass, setGroupingComparatorClass, setInputFormatClass, setJarByClass, setJobName, setMapOutputKeyClass, setMapOutputValueClass, setMapperClass, setMapSpeculativeExecution, setNumReduceTasks, setOutputFormatClass, setOutputKeyClass, setOutputValueClass, setPartitionerClass, setReducerClass, setReduceSpeculativeExecution, setSortComparatorClass, setSpeculativeExecution, setupProgress, setWorkingDirectory, submit

Methods inherited from class org.apache.hadoop.mapreduce.JobContext
`getCombinerClass, getConfiguration, getCredentials, getGroupingComparator, getInputFormatClass, getJobID, getJobName, getMapOutputKeyClass, getMapOutputValueClass, getMapperClass, getNumReduceTasks, getOutputFormatClass, getOutputKeyClass, getOutputValueClass, getPartitionerClass, getReducerClass, getSortComparator, getWorkingDirectory`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

StagedOutputJob

public StagedOutputJob(org.apache.hadoop.conf.Configuration conf,
                       java.lang.String stagingPrefix,
                       org.apache.log4j.Logger log)
                throws java.io.IOException

Initializes the job.

Parameters:: conf - configuration; stagingPrefix - where to stage output temporarily; log - logger
Throws:: java.io.IOException

Method Detail

createStagedJob

public static StagedOutputJob createStagedJob(org.apache.hadoop.conf.Configuration conf,
                                              java.lang.String jobName,
                                              java.util.List<java.lang.String> inputPaths,
                                              java.lang.String stagingLocation,
                                              java.lang.String outputPath,
                                              org.apache.log4j.Logger log)

Creates a job which using a temporary staging location for the output data. The data is only copied to the final output directory on successful completion of the job. This prevents existing output data from being overwritten unless the job completes successfully.

Parameters:: conf - configuration; jobName - job name; inputPaths - input paths; stagingLocation - where to stage output temporarily; outputPath - output path; log - logger
Returns:: job

getCountersParentPath

public org.apache.hadoop.fs.Path getCountersParentPath()

Gets path to store the counters. If this is not set then by default the counters will be stored in the output directory.

Returns:: path parent path for counters

setCountersParentPath

public void setCountersParentPath(org.apache.hadoop.fs.Path path)

Sets path to store the counters. If this is not set then by default the counters will be stored in the output directory.

Parameters:: path - parent path for counters

getCountersPath

public org.apache.hadoop.fs.Path getCountersPath()

Path to written counters.

Returns:: counters path

getWriteCounters

public boolean getWriteCounters()

Get whether counters should be written.

Returns:: true if counters should be written

setWriteCounters

public void setWriteCounters(boolean writeCounters)

Sets whether counters should be written.

Parameters:: writeCounters - true if counters should be written

call

public java.lang.Boolean call()
                       throws java.lang.Exception

Run the job.

Specified by:: call in interface java.util.concurrent.Callable<java.lang.Boolean>

Throws:: java.lang.Exception

waitForCompletion

public boolean waitForCompletion(boolean verbose)
                          throws java.io.IOException,
                                 java.lang.InterruptedException,
                                 java.lang.ClassNotFoundException

Run the job and wait for it to complete. Output will be temporarily stored under the staging path. If the job is successful it will be moved to the final location.

Overrides:: waitForCompletion in class org.apache.hadoop.mapreduce.Job

Throws:: java.io.IOException; java.lang.InterruptedException; java.lang.ClassNotFoundException