datafu.hourglass.jobs
Class ReduceEstimator
java.lang.Object
datafu.hourglass.jobs.ReduceEstimator
public class ReduceEstimator
- extends java.lang.Object
Estimates the number of reducers needed based on input size.
This sums the size of the inputs and uses bytes-per-reducer
settings to compute the number of reducers. By default,
the bytes-per-reducer is 256 MB. This means that if the
total input size is 1 GB, the total number of reducers
computed will be 4.
The bytes-per-reducer can be configured through properties
provided in the constructor. The default bytes-per-reducer
can be overriden by setting num.reducers.bytes.per.reducer.
For example, if 536870912 (512 MB) is used for this setting,
then 2 reducers would be used for 1 GB.
The bytes-per-reducer can also be configured separately for
different types of inputs. Inputs can be identified by a tag.
For example, if an input is tagged with mydata, then
the reducers for this input data can be configured with
num.reducers.mydata.bytes.per.reducer.
- Author:
- "Matthew Hayes"
Constructor Summary |
ReduceEstimator(org.apache.hadoop.fs.FileSystem fs,
java.util.Properties props)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ReduceEstimator
public ReduceEstimator(org.apache.hadoop.fs.FileSystem fs,
java.util.Properties props)
addInputPath
public void addInputPath(org.apache.hadoop.fs.Path input)
addInputPath
public void addInputPath(java.lang.String tag,
org.apache.hadoop.fs.Path input)
getNumReducers
public int getNumReducers()
throws java.io.IOException
- Throws:
java.io.IOException
Matthew Hayes