public class ReduceEstimator
extends java.lang.Object
This sums the size of the inputs and uses bytes-per-reducer settings to compute the number of reducers. By default, the bytes-per-reducer is 256 MB. This means that if the total input size is 1 GB, the total number of reducers computed will be 4.
The bytes-per-reducer can be configured through properties provided in the constructor. The default bytes-per-reducer can be overriden by setting num.reducers.bytes.per.reducer. For example, if 536870912 (512 MB) is used for this setting, then 2 reducers would be used for 1 GB.
The bytes-per-reducer can also be configured separately for different types of inputs. Inputs can be identified by a tag. For example, if an input is tagged with mydata, then the reducers for this input data can be configured with num.reducers.mydata.bytes.per.reducer.
Constructor and Description |
---|
ReduceEstimator(org.apache.hadoop.fs.FileSystem fs,
java.util.Properties props) |
Modifier and Type | Method and Description |
---|---|
void |
addInputPath(org.apache.hadoop.fs.Path input) |
void |
addInputPath(java.lang.String tag,
org.apache.hadoop.fs.Path input) |
int |
getNumReducers() |