datafu.hourglass.avro
Class AvroMultipleInputsKeyInputFormat<T>
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.avro.mapred.AvroKey<T>,org.apache.hadoop.io.NullWritable>
datafu.hourglass.avro.AvroMultipleInputsKeyInputFormat<T>
public class AvroMultipleInputsKeyInputFormat<T>
- extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.avro.mapred.AvroKey<T>,org.apache.hadoop.io.NullWritable>
A MapReduce InputFormat that can handle Avro container files and multiple inputs.
The input schema is determine based on the split. The mapping from input path
to schema is stored in the job configuration.
Keys are AvroKey wrapper objects that contain the Avro data. Since Avro
container files store only records (not key/value pairs), the value from
this InputFormat is a NullWritable.
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter |
Method Summary |
org.apache.hadoop.mapreduce.RecordReader<org.apache.avro.mapred.AvroKey<T>,org.apache.hadoop.io.NullWritable> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
|
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
AvroMultipleInputsKeyInputFormat
public AvroMultipleInputsKeyInputFormat()
createRecordReader
public org.apache.hadoop.mapreduce.RecordReader<org.apache.avro.mapred.AvroKey<T>,org.apache.hadoop.io.NullWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws java.io.IOException,
java.lang.InterruptedException
-
- Specified by:
createRecordReader
in class org.apache.hadoop.mapreduce.InputFormat<org.apache.avro.mapred.AvroKey<T>,org.apache.hadoop.io.NullWritable>
- Throws:
java.io.IOException
java.lang.InterruptedException
Matthew Hayes