datafu.hourglass.jobs
Class ReduceEstimator

java.lang.Object
  extended by datafu.hourglass.jobs.ReduceEstimator

public class ReduceEstimator
extends java.lang.Object

Estimates the number of reducers needed based on input size.

This sums the size of the inputs and uses bytes-per-reducer settings to compute the number of reducers. By default, the bytes-per-reducer is 256 MB. This means that if the total input size is 1 GB, the total number of reducers computed will be 4.

The bytes-per-reducer can be configured through properties provided in the constructor. The default bytes-per-reducer can be overriden by setting num.reducers.bytes.per.reducer. For example, if 536870912 (512 MB) is used for this setting, then 2 reducers would be used for 1 GB.

The bytes-per-reducer can also be configured separately for different types of inputs. Inputs can be identified by a tag. For example, if an input is tagged with mydata, then the reducers for this input data can be configured with num.reducers.mydata.bytes.per.reducer.

Author:
"Matthew Hayes"

Constructor Summary
ReduceEstimator(org.apache.hadoop.fs.FileSystem fs, java.util.Properties props)
           
 
Method Summary
 void addInputPath(org.apache.hadoop.fs.Path input)
           
 void addInputPath(java.lang.String tag, org.apache.hadoop.fs.Path input)
           
 int getNumReducers()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ReduceEstimator

public ReduceEstimator(org.apache.hadoop.fs.FileSystem fs,
                       java.util.Properties props)
Method Detail

addInputPath

public void addInputPath(org.apache.hadoop.fs.Path input)

addInputPath

public void addInputPath(java.lang.String tag,
                         org.apache.hadoop.fs.Path input)

getNumReducers

public int getNumReducers()
                   throws java.io.IOException
Throws:
java.io.IOException


Matthew Hayes