datafu.hourglass.avro
Class AvroMultipleInputsKeyInputFormat<T>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.avro.mapred.AvroKey<T>,org.apache.hadoop.io.NullWritable>
          extended by datafu.hourglass.avro.AvroMultipleInputsKeyInputFormat<T>

public class AvroMultipleInputsKeyInputFormat<T>
extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.avro.mapred.AvroKey<T>,org.apache.hadoop.io.NullWritable>

A MapReduce InputFormat that can handle Avro container files and multiple inputs. The input schema is determine based on the split. The mapping from input path to schema is stored in the job configuration.

Keys are AvroKey wrapper objects that contain the Avro data. Since Avro container files store only records (not key/value pairs), the value from this InputFormat is a NullWritable.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter
 
Constructor Summary
AvroMultipleInputsKeyInputFormat()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<org.apache.avro.mapred.AvroKey<T>,org.apache.hadoop.io.NullWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
          
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AvroMultipleInputsKeyInputFormat

public AvroMultipleInputsKeyInputFormat()
Method Detail

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<org.apache.avro.mapred.AvroKey<T>,org.apache.hadoop.io.NullWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                                                        org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                                                                 throws java.io.IOException,
                                                                                                                                        java.lang.InterruptedException

Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.avro.mapred.AvroKey<T>,org.apache.hadoop.io.NullWritable>
Throws:
java.io.IOException
java.lang.InterruptedException


Matthew Hayes